112 B
112 B
Optimizer for policy gradient methods
TODO: vanilla introduce a baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR
TODO: vanilla introduce a baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR