22 lines
371 B
Markdown
22 lines
371 B
Markdown
|
# Batch
|
||
|
|
||
|
YouQiaoben
|
||
|
|
||
|
fix as stated in ppo_example.py
|
||
|
|
||
|
|
||
|
|
||
|
# Replay
|
||
|
|
||
|
ShihongSong
|
||
|
|
||
|
a Replay.py file. must have collect() and next_batch() methods for training.
|
||
|
|
||
|
integrate previous ReplayBuffer codes.
|
||
|
|
||
|
|
||
|
# adv_estimate
|
||
|
|
||
|
YouQiaoben (gae_lambda), ShihongSong(dqn after policy.DQN)
|
||
|
|
||
|
seems to be direct python functions. also may write it in a functional form.
|