11 lines
112 B
Markdown
11 lines
112 B
Markdown
# Optimizer for policy gradient methods
|
|
TODO:
|
|
vanilla
|
|
introduce a baseline
|
|
REINFORCE
|
|
TRPO
|
|
PPO
|
|
GAE
|
|
NAF
|
|
DPG
|
|
ACKTR |