n+e fc251ab0b8
bump to v0.4.3 (#432)
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
2021-09-03 05:05:04 +08:00
..
2020-11-09 16:43:55 +08:00
2021-09-03 05:05:04 +08:00
2021-09-03 05:05:04 +08:00
2021-09-03 05:05:04 +08:00
2021-09-03 05:05:04 +08:00
2020-11-09 16:43:55 +08:00

Bipedal-Hardcore-SAC

  • Our default choice: remove the done flag penalty, will soon converge to ~280 reward within 100 epochs (10M env steps, 3~4 hours, see the image below)
  • If the done penalty is not removed, it converges much slower than before, about 200 epochs (20M env steps) to reach the same performance (~200 reward)