Tianshou

Author	SHA1	Message	Date
haoshengzou	87889d766c	minor fixes. proceed to refactor replay to use lists as in batch.	2018-02-26 11:47:02 +08:00
sproblvem	7711686bc6	Update README.md add the dependency	2018-02-12 15:28:25 +08:00
Dong Yan	2163d18728	fix the env -> self._env bug	2018-02-10 03:42:00 +08:00
Dong Yan	50b2d98d0a	support ctrl-c to terminate play.py	2018-02-08 21:17:56 +08:00
haoshengzou	e6d477f9a3	modified top-level .gitignore to include tianshou/data	2018-01-25 16:08:04 +08:00
haoshengzou	b8568c6af4	added data/utils.py. was ignored by .gitignore before...	2018-01-25 10:15:38 +08:00
haoshengzou	5910e08672	data/utils.py added but not pushed...	2018-01-25 10:11:36 +08:00
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	9f96cc2461	finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.	2018-01-17 14:21:50 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
rtz19970824	3b222f5edb	add an args to intrigue training	2018-01-13 15:59:57 +08:00
rtz19970824	2e8662889f	add multi-thread for end-to-end training	2018-01-13 15:57:41 +08:00
rtz19970824	fcaa571b42	add the interface in engine.py	2018-01-12 21:48:01 +08:00
Dong Yan	68cc63144f	fix the hash conflict bug	2018-01-12 21:08:07 +08:00
rtz19970824	90ffdcbb1f	check the latest checkpoint while self play	2018-01-12 19:16:44 +08:00
rtz19970824	c217aa165d	add some error message for better debugging	2018-01-12 17:17:03 +08:00
Dong Yan	e58df65301	fix the async bug between think and do move checking, which introduced by bobo	2018-01-11 21:00:32 +08:00
Dong Yan	afc55ed9c2	refactor code to avoid memory leak	2018-01-11 17:02:36 +08:00
sproblvem	284cc64c18	Merge pull request #3 from sproblvem/double-network Double network	2018-01-11 10:55:12 +08:00
Dong Yan	5482815de6	replace two isolated player process by two different set of variables in the tf graph	2018-01-10 23:27:17 +08:00
Dong Yan	f425085e0a	fix the tf assign error of copy the trained variable from black to white	2018-01-09 21:16:35 +08:00
rtz19970824	c2775df8e6	modify game.py for multi-player	2018-01-09 20:09:48 +08:00
rtz19970824	eb0ce95919	modify model.py for multi-player	2018-01-09 19:50:37 +08:00
Tongzheng Ren	891c5b1e47	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-01-08 21:21:08 +08:00
Tongzheng Ren	f2edc4896e	modify play.py for avoiding potential bug	2018-01-08 21:19:17 +08:00
rtz19970824	32b7b33ed5	debug: we should estimate our own win rate	2018-01-08 16:19:59 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
haoshengzou	88648f0c4b	Merge branch 'master' of https://github.com/sproblvem/tianshou	2017-12-31 15:56:19 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
Wenbo Hu	50e8ea36e8	merge	2017-12-29 03:31:57 +08:00
Wenbo Hu	63a0d32b34	use hash table for check_global_isomorphous	2017-12-29 03:30:09 +08:00
Wenbo Hu	da156ed88e	Merge branch 'master' of github.com:sproblvem/tianshou	2017-12-29 03:19:46 +08:00
Wenbo Hu	76ac579056	Merge branch 'master' of github.com:sproblvem/tianshou	2017-12-29 01:05:14 +08:00
rtz19970824	2dfab68efe	debug for unit test	2017-12-28 19:28:21 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
rtz19970824	b699258e76	debug for reversi	2017-12-28 15:55:07 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00
Dong Yan	47676993fd	solve the performance bottleneck by only hashing the last board	2017-12-28 01:16:24 +08:00
Dong Yan	affd0319e2	rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action	2017-12-27 21:11:40 +08:00
Dong Yan	d48982d59e	move evaluator from action node to mcts	2017-12-27 20:49:54 +08:00
rtz19970824	0a160065aa	Merge branch 'master' of https://github.com/sproblvem/tianshou	2017-12-27 19:54:52 +08:00

1 2 3 4 5 ...

332 Commits