Tianshou

Author	SHA1	Message	Date
haoshengzou	8c108174b6	some more API docs	2018-04-15 11:46:46 +08:00
haoshengzou	9186dae6a3	more API docs	2018-04-15 09:35:31 +08:00
haoshengzou	2a3bc3ef35	part of API doc	2018-04-12 21:10:50 +08:00
haoshengzou	03246f7ded	functional code freeze. all examples working. prepare to release.	2018-04-11 14:23:40 +08:00
haoshengzou	75e7f14051	towards ddpg	2018-03-28 18:47:41 +08:00
haoshengzou	52e6b09768	finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!	2018-03-11 17:47:42 +08:00
haoshengzou	e68dcd3c64	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-08 16:51:12 +08:00
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
haoshengzou	5910e08672	data/utils.py added but not pushed...	2018-01-25 10:11:36 +08:00
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00
Dong Yan	47676993fd	solve the performance bottleneck by only hashing the last board	2017-12-28 01:16:24 +08:00
Dong Yan	affd0319e2	rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action	2017-12-27 21:11:40 +08:00
Dong Yan	d48982d59e	move evaluator from action node to mcts	2017-12-27 20:49:54 +08:00
JialianLee	8d102d249f	Modification for backpropagation process	2017-12-27 18:55:00 +08:00
Dong Yan	9f60984973	remove type_conversion function	2017-12-27 14:08:34 +08:00
Dong Yan	a1f6044cba	rewrite selection function of ActionNode for clarity, add and delete some notes	2017-12-27 11:43:04 +08:00
Dong Yan	7f0565a5f6	variable rename and delete redundant code	2017-12-26 22:19:10 +08:00
sproblvem	2b24f0760e	Merge branch 'master' into mcts_virtual_loss	2017-12-24 21:27:54 +08:00
Dong Yan	89226b449a	replace try catch by isinstance collections.Hashable	2017-12-24 20:57:53 +08:00
Dong Yan	f0074aa7ca	fix bug of game config and add profing functions to mcts	2017-12-24 17:43:45 +08:00
mcgrady00h	5aa5dcd191	add comments for mcts with virtual loss	2017-12-24 16:47:43 +08:00
mcgrady00h	8c6f44a015	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:49:45 +08:00
mcgrady00h	941284e7b1	Merge remote-tracking branch 'origin' into mcts_virtual_loss	2017-12-24 15:44:30 +08:00
rtz19970824	74504ceb1d	debug for go and reversi	2017-12-24 14:40:50 +08:00
Dong Yan	426251e158	add some code for debug and profiling	2017-12-24 01:07:46 +08:00
haoshengzou	b2b2d01d9c	Merge remote-tracking branch 'origin/master'	2017-12-23 17:25:37 +08:00
haoshengzou	b21a55dc88	towards policy/value refactor	2017-12-23 17:25:16 +08:00
rtz19970824	3f238864fb	minor fixed for mcts, check finish for go	2017-12-23 15:58:06 +08:00
haoshengzou	8c13d8ebe6	Merge remote-tracking branch 'origin/master'	2017-12-23 15:36:44 +08:00
haoshengzou	04048b7873	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
Dong Yan	b2ef770415	connect reversi with game	2017-12-23 13:05:25 +08:00
mcgrady00h	3b534064bd	fix virtual loss bug	2017-12-23 02:48:53 +08:00
Haosheng Zou	8ba16a8808	Merge remote-tracking branch 'origin/master'	2017-12-22 00:24:06 +08:00
Haosheng Zou	1cc5063007	add value_function (critic). value_function and policy not finished yet.	2017-12-22 00:22:23 +08:00
Wenbo Hu	ced63af18f	fixing bug pass parameterg	2017-12-21 19:31:51 +08:00

1 2

86 Commits