History

fix critical bugs in MAPolicy and docs update (#207 )

- fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy)
- polish examples/box2d/bipedal_hardcore_sac.py
- several docs update
- format setup.py and bump version to 0.2.7

2020-09-08 21:10:48 +08:00

results/sac

fix critical bugs in MAPolicy and docs update (#207 )

2020-09-08 21:10:48 +08:00

acrobot_dualdqn.py

optimize training procedure and improve code coverage (#189 )

2020-08-27 12:15:18 +08:00

bipedal_hardcore_sac.py

fix critical bugs in MAPolicy and docs update (#207 )

2020-09-08 21:10:48 +08:00

lunarlander_dqn.py

optimize training procedure and improve code coverage (#189 )

2020-08-27 12:15:18 +08:00

mcc_sac.py

optimize training procedure and improve code coverage (#189 )

2020-08-27 12:15:18 +08:00

README.md

fix critical bugs in MAPolicy and docs update (#207 )

2020-09-08 21:10:48 +08:00

README.md

Bipedal-Hardcore-SAC

Our default choice: remove the done flag penalty, will soon converge to ~250 reward within 100 epochs (10M env steps, 3~4 hours, see the image below)
If the done penalty is not removed, it converges much slower than before, about 200 epochs (20M env steps) to reach the same performance (~200 reward)
Action noise is only necessary in the beginning. It is a negative impact at the end of the training. Removing it can reach ~255 (our best result under the original env, no done penalty removed).