hongshaorou/Tianshou

History

haosheng a00b930c2c fix naming and comments of coding style, delete .json

2017-12-10 17:23:13 +08:00

..

replay buffer initial commit

2017-12-10 14:56:04 +08:00

__init__.py

model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs

2017-12-08 21:09:23 +08:00

.gitignore

architecture design patch two

2017-11-06 15:24:34 +08:00

advantage_estimation.py

fix naming and comments of coding style, delete .json

2017-12-10 17:23:13 +08:00

batch.py

fix naming and comments of coding style, delete .json

2017-12-10 17:23:13 +08:00

README.md

model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs

2017-12-08 21:09:23 +08:00

README.md

Batch

YouQiaoben

fix as stated in ppo_example.py

Replay

ShihongSong

a Replay.py file. must have collect() and next_batch() methods for training.

integrate previous ReplayBuffer codes.

adv_estimate

YouQiaoben (gae_lambda), ShihongSong(dqn after policy.DQN)

seems to be direct python functions. also may write it in a functional form.