Tianshou

Author	SHA1	Message	Date
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00
Dong Yan	47676993fd	solve the performance bottleneck by only hashing the last board	2017-12-28 01:16:24 +08:00
Dong Yan	affd0319e2	rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action	2017-12-27 21:11:40 +08:00
Dong Yan	d48982d59e	move evaluator from action node to mcts	2017-12-27 20:49:54 +08:00
JialianLee	8d102d249f	Modification for backpropagation process	2017-12-27 18:55:00 +08:00
Dong Yan	9f60984973	remove type_conversion function	2017-12-27 14:08:34 +08:00
Dong Yan	a1f6044cba	rewrite selection function of ActionNode for clarity, add and delete some notes	2017-12-27 11:43:04 +08:00
Dong Yan	7f0565a5f6	variable rename and delete redundant code	2017-12-26 22:19:10 +08:00
sproblvem	2b24f0760e	Merge branch 'master' into mcts_virtual_loss	2017-12-24 21:27:54 +08:00
Dong Yan	89226b449a	replace try catch by isinstance collections.Hashable	2017-12-24 20:57:53 +08:00
Dong Yan	f0074aa7ca	fix bug of game config and add profing functions to mcts	2017-12-24 17:43:45 +08:00
mcgrady00h	5aa5dcd191	add comments for mcts with virtual loss	2017-12-24 16:47:43 +08:00
mcgrady00h	8c6f44a015	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:49:45 +08:00
mcgrady00h	941284e7b1	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:44:30 +08:00
rtz19970824	74504ceb1d	debug for go and reversi	2017-12-24 14:40:50 +08:00
Dong Yan	426251e158	add some code for debug and profiling	2017-12-24 01:07:46 +08:00
haoshengzou	b2b2d01d9c	Merge remote-tracking branch 'origin/master'	2017-12-23 17:25:37 +08:00
haoshengzou	b21a55dc88	towards policy/value refactor	2017-12-23 17:25:16 +08:00
rtz19970824	3f238864fb	minor fixed for mcts, check finish for go	2017-12-23 15:58:06 +08:00
haoshengzou	8c13d8ebe6	Merge remote-tracking branch 'origin/master'	2017-12-23 15:36:44 +08:00
haoshengzou	04048b7873	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
Dong Yan	b2ef770415	connect reversi with game	2017-12-23 13:05:25 +08:00
mcgrady00h	3b534064bd	fix virtual loss bug	2017-12-23 02:48:53 +08:00
Haosheng Zou	8ba16a8808	Merge remote-tracking branch 'origin/master'	2017-12-22 00:24:06 +08:00
Haosheng Zou	1cc5063007	add value_function (critic). value_function and policy not finished yet.	2017-12-22 00:22:23 +08:00
Wenbo Hu	ced63af18f	fixing bug pass parameterg	2017-12-21 19:31:51 +08:00
Wenbo Hu	f0d59dab6c	forbid pass, if we have other choices	2017-12-20 22:10:47 +08:00
Wenbo Hu	e2c6b96e57	minor revision.	2017-12-20 21:52:30 +08:00
Wenbo Hu	48e95a21ea	simulator process a valid set, instead of a single action	2017-12-20 21:35:35 +08:00
rtz19970824	7fca90c61b	modify the mcts, refactor the network	2017-12-20 16:43:42 +08:00
Dong Yan	232204d797	fix the copy bug in check_global_isomorphous; refactor code to eliminate side effect	2017-12-19 22:57:38 +08:00
mcgrady00h	1f011a44ef	add mcts virtual loss version (may have bugs)	2017-12-19 17:04:55 +08:00
Dong Yan	fc8114fe35	merge flatten and deflatten, rename variable for clarity	2017-12-19 16:51:50 +08:00
宋世虹	7693c38f44	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	62e2c6582d	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Dong Yan	e10acf5130	0. code refactor, try to merge Go and GoEnv	2017-12-16 23:29:11 +08:00
Dong Yan	6cb4b02fca	merge class strategy with class game. Next, merge Go with GoEnv	2017-12-15 22:19:44 +08:00
rtz19970824	0874d5342f	implement dqn loss and dpg loss, add TODO for separate actor and critic	2017-12-15 14:24:08 +08:00
Haosheng Zou	f496725437	add dqn.py to write	2017-12-13 22:43:45 +08:00
Haosheng Zou	72ae304ab3	preliminary design of dqn_example, dqn interface. identify the assign of networks	2017-12-13 20:47:45 +08:00
rtz19970824	0c4a83f3eb	vanilla policy gradient	2017-12-11 13:37:27 +08:00
haosheng	a00b930c2c	fix naming and comments of coding style, delete .json	2017-12-10 17:23:13 +08:00
rtz19970824	a8a12f1083	coding style	2017-12-10 14:23:40 +08:00

1 2

69 Commits