Tianshou

Author	SHA1	Message	Date
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
宋世虹	d220f7f2a8	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	3624cc9036	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
rtz19970824	e5bf7a9270	implement dqn loss and dpg loss, add TODO for separate actor and critic	2017-12-15 14:24:08 +08:00
haosheng	a00b930c2c	fix naming and comments of coding style, delete .json	2017-12-10 17:23:13 +08:00
songshshshsh	f1a7fd9ee1	replay buffer initial commit	2017-12-10 14:56:04 +08:00
rtz19970824	18b3b0b850	add some TODO	2017-12-10 13:31:43 +08:00
haosheng	ff4306ddb9	model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs	2017-12-08 21:09:23 +08:00
Tongzheng Ren	6d9c369a65	architecture design patch two	2017-11-06 15:24:34 +08:00

11 Commits