# Optimizer for policy gradient methods
TODO:

vanilla

baseline

REINFORCE

TRPO

PPO

GAE

NAF

DPG

ACKTR