Tianshou/tianshou/data/README.md

# TODO:

Notice that we will separate actor and critic, and batch will collect data for optimizing policy while replay will collect data for optimizing critic.

# Batch

YouQiaoben

fix as stated in ppo_example.py


# Replay

ShihongSong

a Replay.py file. must have collect() and next_batch() methods for training.

integrate previous ReplayBuffer codes.


# adv_estimate

YouQiaoben (gae_lambda), ShihongSong(dqn after policy.DQN)

seems to be direct python functions. also may write it in a functional form.
implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00			`# TODO:`

			`Notice that we will separate actor and critic, and batch will collect data for optimizing policy while replay will collect data for optimizing critic.`

model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`# Batch`

			`YouQiaoben`

			`fix as stated in ppo_example.py`



			`# Replay`

			`ShihongSong`

			`a Replay.py file. must have collect() and next_batch() methods for training.`

			`integrate previous ReplayBuffer codes.`


			`# adv_estimate`

			`YouQiaoben (gae_lambda), ShihongSong(dqn after policy.DQN)`

			`seems to be direct python functions. also may write it in a functional form.`