11 lines
111 B
Markdown
11 lines
111 B
Markdown
|
# Optimizer for policy gradient methods
|
||
|
TODO:
|
||
|
vanilla
|
||
|
introduce a baseline
|
||
|
REINFORCE
|
||
|
TRPO
|
||
|
PPO
|
||
|
GAE
|
||
|
NAF
|
||
|
DPG
|
||
|
ACKTR
|