hongshaorou/Tianshou

Tongzheng Ren 4e4a7b74c1 update the optimizer README

2017-11-06 14:01:29 +08:00

112 B

Raw Blame History

Optimizer for policy gradient methods

TODO: vanilla introduce a baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR