Tianshou

Author	SHA1	Message	Date
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
rtz19970824	3b222f5edb	add an args to intrigue training	2018-01-13 15:59:57 +08:00
rtz19970824	2e8662889f	add multi-thread for end-to-end training	2018-01-13 15:57:41 +08:00
rtz19970824	fcaa571b42	add the interface in engine.py	2018-01-12 21:48:01 +08:00
Dong Yan	68cc63144f	fix the hash conflict bug	2018-01-12 21:08:07 +08:00
rtz19970824	90ffdcbb1f	check the latest checkpoint while self play	2018-01-12 19:16:44 +08:00
rtz19970824	c217aa165d	add some error message for better debugging	2018-01-12 17:17:03 +08:00
Dong Yan	e58df65301	fix the async bug between think and do move checking, which introduced by bobo	2018-01-11 21:00:32 +08:00
Dong Yan	afc55ed9c2	refactor code to avoid memory leak	2018-01-11 17:02:36 +08:00
sproblvem	284cc64c18	Merge pull request #3 from sproblvem/double-network Double network	2018-01-11 10:55:12 +08:00
Dong Yan	5482815de6	replace two isolated player process by two different set of variables in the tf graph	2018-01-10 23:27:17 +08:00
Dong Yan	f425085e0a	fix the tf assign error of copy the trained variable from black to white	2018-01-09 21:16:35 +08:00
rtz19970824	c2775df8e6	modify game.py for multi-player	2018-01-09 20:09:48 +08:00
rtz19970824	eb0ce95919	modify model.py for multi-player	2018-01-09 19:50:37 +08:00
Tongzheng Ren	891c5b1e47	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-01-08 21:21:08 +08:00
Tongzheng Ren	f2edc4896e	modify play.py for avoiding potential bug	2018-01-08 21:19:17 +08:00
rtz19970824	32b7b33ed5	debug: we should estimate our own win rate	2018-01-08 16:19:59 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
haoshengzou	88648f0c4b	Merge branch 'master' of https://github.com/sproblvem/tianshou	2017-12-31 15:56:19 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
Wenbo Hu	50e8ea36e8	merge	2017-12-29 03:31:57 +08:00
Wenbo Hu	63a0d32b34	use hash table for check_global_isomorphous	2017-12-29 03:30:09 +08:00
Wenbo Hu	da156ed88e	Merge branch 'master' of github.com:sproblvem/tianshou	2017-12-29 03:19:46 +08:00
Wenbo Hu	76ac579056	Merge branch 'master' of github.com:sproblvem/tianshou	2017-12-29 01:05:14 +08:00
rtz19970824	2dfab68efe	debug for unit test	2017-12-28 19:28:21 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
rtz19970824	b699258e76	debug for reversi	2017-12-28 15:55:07 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00
Dong Yan	47676993fd	solve the performance bottleneck by only hashing the last board	2017-12-28 01:16:24 +08:00
Dong Yan	affd0319e2	rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action	2017-12-27 21:11:40 +08:00
Dong Yan	d48982d59e	move evaluator from action node to mcts	2017-12-27 20:49:54 +08:00
rtz19970824	0a160065aa	Merge branch 'master' of https://github.com/sproblvem/tianshou	2017-12-27 19:54:52 +08:00
rtz19970824	f2291efc72	check exists when save data	2017-12-27 19:54:36 +08:00
JialianLee	8d102d249f	Modification for backpropagation process	2017-12-27 18:55:00 +08:00
Dong Yan	9f60984973	remove type_conversion function	2017-12-27 14:08:34 +08:00
Dong Yan	a1f6044cba	rewrite selection function of ActionNode for clarity, add and delete some notes	2017-12-27 11:43:04 +08:00
Dong Yan	c788b253fb	show the stdout of player.py for debugging	2017-12-27 01:04:09 +08:00
Dong Yan	7f0565a5f6	variable rename and delete redundant code	2017-12-26 22:19:10 +08:00
Dong Yan	0c3ff3bf37	delete unused code	2017-12-26 19:29:35 +08:00
Dong Yan	029ab199f4	add softmax for mcts root node	2017-12-26 16:47:24 +08:00
Dong Yan	8f508c790b	add role for mcts debug	2017-12-26 15:07:15 +08:00
Dong Yan	aa6b5434c6	add debuf info for mcts and add softmax for the prior	2017-12-26 14:46:14 +08:00

1 2 3 4 5 ...

322 Commits