111 B
111 B
Optimizer for policy gradient methods
TODO: vanilla introduce a baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR
TODO: vanilla introduce a baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR