Tianshou/tianshou/core/README.md

#TODO: 

Separate actor and critic. (Important, we need to focus on that recently)

# policy

YongRen

### base, stochastic

follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py

### deterministic

not sure how to write, but should at least have act() method to interact with environment

referencing QValuePolicy in base.py, should have at least the listed methods.


# losses

TongzhengRen

seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.
implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00			`#TODO:`

			`Separate actor and critic. (Important, we need to focus on that recently)`

model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`# policy`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`YongRen`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`### base, stochastic`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`follow OnehotCategorical to write Gaussian, can be in the same file as stochastic.py`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`### deterministic`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`not sure how to write, but should at least have act() method to interact with environment`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00			`referencing QValuePolicy in base.py, should have at least the listed methods.`
modification to docs of mcts 2017-11-18 09:37:15 +08:00

model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`# losses`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`TongzhengRen`
modification to docs of mcts 2017-11-18 09:37:15 +08:00
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00			`seems to be direct python functions. Though the management of placeholders may require some discussion. also may write it in a functional form.`