Yi Su 40289b8b0e
Add atari ppo example (#523)
I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156.
2022-02-11 06:45:06 +08:00
..
2021-01-06 10:17:45 +08:00
2020-08-30 05:48:09 +08:00
2022-02-11 06:45:06 +08:00
2021-01-28 09:27:05 +08:00
2021-08-29 23:34:59 +08:00