# Optimizer for policy gradient methods TODO: vanilla baseline REINFORCE TRPO PPO GAE NAF DPG ACKTR