Tianshou

Author	SHA1	Message	Date
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
haoshengzou	54a7b1343d	design exploration and evaluators for off-policy algos	2018-03-04 13:53:29 +08:00
Dong Yan	2eb056a721	Merge branch 'master' of github.com:sproblvem/tianshou	2018-03-03 21:30:15 +08:00
Dong Yan	0cf2fd6c53	an initial version of untested replaymemory qreturn	2018-03-03 21:25:29 +08:00
haoshengzou	e302fd87fb	vanilla replay buffer finished and tested. working on data_collector.	2018-03-03 20:42:34 +08:00
Dong Yan	528c4be93c	add render option for ddpg	2018-02-28 18:44:06 +08:00
haoshengzou	5ab2fa3b65	minor fixes	2018-02-27 14:46:02 +08:00
haoshengzou	675057c6b9	interfaces for advantage_estimation. full_return finished and tested.	2018-02-27 14:11:52 +08:00
songshshshsh	25b25ce7d8	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-02-27 13:15:36 +08:00
songshshshsh	67d0e78ab9	first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing	2018-02-27 13:13:38 +08:00
haoshengzou	40190a282e	Merge remote-tracking branch 'origin/master' # Conflicts: # README.md	2018-02-26 11:48:46 +08:00
haoshengzou	87889d766c	minor fixes. proceed to refactor replay to use lists as in batch.	2018-02-26 11:47:02 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
rtz19970824	a40e5aec54	modified README	2018-02-24 16:26:19 +08:00
Dong Yan	f3aee448e0	add option to show the running result of cartpole	2018-02-24 10:53:39 +08:00
Dong Yan	764f7fb5f1	minor fix of play.py	2018-02-23 23:15:04 +08:00
sproblvem	a0849fa213	Merge pull request #5 from sproblvem/union_set add union set for do_move and is_valid The modify on play.py should be removed, I will fix it on latter commit	2018-02-23 15:01:17 +08:00
sproblvem	7711686bc6	Update README.md add the dependency	2018-02-12 15:28:25 +08:00
Dong Yan	2163d18728	fix the env -> self._env bug	2018-02-10 03:42:00 +08:00
Dong Yan	50b2d98d0a	support ctrl-c to terminate play.py	2018-02-08 21:17:56 +08:00
haoshengzou	e6d477f9a3	modified top-level .gitignore to include tianshou/data	2018-01-25 16:08:04 +08:00
haoshengzou	b8568c6af4	added data/utils.py. was ignored by .gitignore before...	2018-01-25 10:15:38 +08:00
haoshengzou	5910e08672	data/utils.py added but not pushed...	2018-01-25 10:11:36 +08:00
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
Wenbo	0131bcdc85	fix minor	2018-01-17 15:57:41 +08:00
Wenbo	0e4aa44ebb	add deepcopy for hash, add some testing	2018-01-17 15:54:46 +08:00
haoshengzou	9f96cc2461	finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.	2018-01-17 14:21:50 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
Wenbo Hu	e76ccaee80	add union set for do_move and is_valid	2018-01-16 14:10:56 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
rtz19970824	3b222f5edb	add an args to intrigue training	2018-01-13 15:59:57 +08:00
rtz19970824	2e8662889f	add multi-thread for end-to-end training	2018-01-13 15:57:41 +08:00
rtz19970824	fcaa571b42	add the interface in engine.py	2018-01-12 21:48:01 +08:00
Dong Yan	68cc63144f	fix the hash conflict bug	2018-01-12 21:08:07 +08:00
rtz19970824	90ffdcbb1f	check the latest checkpoint while self play	2018-01-12 19:16:44 +08:00
rtz19970824	c217aa165d	add some error message for better debugging	2018-01-12 17:17:03 +08:00
Dong Yan	e58df65301	fix the async bug between think and do move checking, which introduced by bobo	2018-01-11 21:00:32 +08:00
Dong Yan	afc55ed9c2	refactor code to avoid memory leak	2018-01-11 17:02:36 +08:00
sproblvem	284cc64c18	Merge pull request #3 from sproblvem/double-network Double network	2018-01-11 10:55:12 +08:00
Dong Yan	5482815de6	replace two isolated player process by two different set of variables in the tf graph	2018-01-10 23:27:17 +08:00
Dong Yan	f425085e0a	fix the tf assign error of copy the trained variable from black to white	2018-01-09 21:16:35 +08:00
rtz19970824	c2775df8e6	modify game.py for multi-player	2018-01-09 20:09:48 +08:00
rtz19970824	eb0ce95919	modify model.py for multi-player	2018-01-09 19:50:37 +08:00
Tongzheng Ren	891c5b1e47	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-01-08 21:21:08 +08:00
Tongzheng Ren	f2edc4896e	modify play.py for avoiding potential bug	2018-01-08 21:19:17 +08:00
rtz19970824	32b7b33ed5	debug: we should estimate our own win rate	2018-01-08 16:19:59 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00

1 2 3 4 5 ...

351 Commits