Tianshou

Author	SHA1	Message	Date
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	9f96cc2461	finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.	2018-01-17 14:21:50 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
haoshengzou	b33a141373	towards policy/value refactor	2017-12-23 17:25:16 +08:00
haoshengzou	2addef41d2	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
宋世虹	d220f7f2a8	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	3624cc9036	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Haosheng Zou	92deae9f8d	minor fix	2017-12-14 19:46:38 +08:00
Haosheng Zou	7ab211b63c	preliminary design of dqn_example, dqn interface. identify the assign of networks	2017-12-13 20:47:45 +08:00
haosheng	972044c39d	minor fix	2017-12-10 17:33:10 +08:00
haosheng	a00b930c2c	fix naming and comments of coding style, delete .json	2017-12-10 17:23:13 +08:00
haosheng	ff4306ddb9	model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs	2017-12-08 21:09:23 +08:00
Tongzheng Ren	6d9c369a65	architecture design patch two	2017-11-06 15:24:34 +08:00

19 Commits