Tianshou

Author	SHA1	Message	Date
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00
Dong Yan	47676993fd	solve the performance bottleneck by only hashing the last board	2017-12-28 01:16:24 +08:00
Dong Yan	affd0319e2	rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action	2017-12-27 21:11:40 +08:00
Dong Yan	d48982d59e	move evaluator from action node to mcts	2017-12-27 20:49:54 +08:00
JialianLee	8d102d249f	Modification for backpropagation process	2017-12-27 18:55:00 +08:00
Dong Yan	9f60984973	remove type_conversion function	2017-12-27 14:08:34 +08:00
Dong Yan	a1f6044cba	rewrite selection function of ActionNode for clarity, add and delete some notes	2017-12-27 11:43:04 +08:00
Dong Yan	7f0565a5f6	variable rename and delete redundant code	2017-12-26 22:19:10 +08:00
sproblvem	2b24f0760e	Merge branch 'master' into mcts_virtual_loss	2017-12-24 21:27:54 +08:00
Dong Yan	89226b449a	replace try catch by isinstance collections.Hashable	2017-12-24 20:57:53 +08:00
Dong Yan	f0074aa7ca	fix bug of game config and add profing functions to mcts	2017-12-24 17:43:45 +08:00
mcgrady00h	5aa5dcd191	add comments for mcts with virtual loss	2017-12-24 16:47:43 +08:00
mcgrady00h	8c6f44a015	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:49:45 +08:00
mcgrady00h	941284e7b1	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:44:30 +08:00
rtz19970824	74504ceb1d	debug for go and reversi	2017-12-24 14:40:50 +08:00
Dong Yan	426251e158	add some code for debug and profiling	2017-12-24 01:07:46 +08:00
haoshengzou	b2b2d01d9c	Merge remote-tracking branch 'origin/master'	2017-12-23 17:25:37 +08:00
haoshengzou	b21a55dc88	towards policy/value refactor	2017-12-23 17:25:16 +08:00
rtz19970824	3f238864fb	minor fixed for mcts, check finish for go	2017-12-23 15:58:06 +08:00
haoshengzou	8c13d8ebe6	Merge remote-tracking branch 'origin/master'	2017-12-23 15:36:44 +08:00
haoshengzou	04048b7873	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
Dong Yan	b2ef770415	connect reversi with game	2017-12-23 13:05:25 +08:00
mcgrady00h	3b534064bd	fix virtual loss bug	2017-12-23 02:48:53 +08:00
Haosheng Zou	8ba16a8808	Merge remote-tracking branch 'origin/master'	2017-12-22 00:24:06 +08:00
Haosheng Zou	1cc5063007	add value_function (critic). value_function and policy not finished yet.	2017-12-22 00:22:23 +08:00
Wenbo Hu	ced63af18f	fixing bug pass parameterg	2017-12-21 19:31:51 +08:00
Wenbo Hu	f0d59dab6c	forbid pass, if we have other choices	2017-12-20 22:10:47 +08:00
Wenbo Hu	e2c6b96e57	minor revision.	2017-12-20 21:52:30 +08:00
Wenbo Hu	48e95a21ea	simulator process a valid set, instead of a single action	2017-12-20 21:35:35 +08:00
rtz19970824	7fca90c61b	modify the mcts, refactor the network	2017-12-20 16:43:42 +08:00
Dong Yan	232204d797	fix the copy bug in check_global_isomorphous; refactor code to eliminate side effect	2017-12-19 22:57:38 +08:00
mcgrady00h	1f011a44ef	add mcts virtual loss version (may have bugs)	2017-12-19 17:04:55 +08:00
Dong Yan	fc8114fe35	merge flatten and deflatten, rename variable for clarity	2017-12-19 16:51:50 +08:00
宋世虹	7693c38f44	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	62e2c6582d	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Dong Yan	e10acf5130	0. code refactor, try to merge Go and GoEnv	2017-12-16 23:29:11 +08:00
Dong Yan	6cb4b02fca	merge class strategy with class game. Next, merge Go with GoEnv	2017-12-15 22:19:44 +08:00
rtz19970824	0874d5342f	implement dqn loss and dpg loss, add TODO for separate actor and critic	2017-12-15 14:24:08 +08:00
Haosheng Zou	f496725437	add dqn.py to write	2017-12-13 22:43:45 +08:00
Haosheng Zou	72ae304ab3	preliminary design of dqn_example, dqn interface. identify the assign of networks	2017-12-13 20:47:45 +08:00

1 2

78 Commits