78 Commits

Author SHA1 Message Date
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
JialianLee
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value 2018-01-05 17:02:19 +08:00
haoshengzou
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research 2018-01-03 20:32:05 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
JialianLee
5849776c9a Modification and doc for unit test 2017-12-29 13:45:53 +08:00
rtz19970824
01f39f40d3 debug for unit test 2017-12-28 19:38:25 +08:00
JialianLee
4140d8c9d2 Modification on unit test 2017-12-28 17:10:25 +08:00
JialianLee
0352866b1a Modification for game engine 2017-12-28 16:27:28 +08:00
JialianLee
5457e5134e add a unit test 2017-12-28 16:20:44 +08:00
Dong Yan
08b6649fea test next_action.next_state in MCTS 2017-12-28 15:52:31 +08:00
Dong Yan
47676993fd solve the performance bottleneck by only hashing the last board 2017-12-28 01:16:24 +08:00
Dong Yan
affd0319e2 rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action 2017-12-27 21:11:40 +08:00
Dong Yan
d48982d59e move evaluator from action node to mcts 2017-12-27 20:49:54 +08:00
JialianLee
8d102d249f Modification for backpropagation process 2017-12-27 18:55:00 +08:00
Dong Yan
9f60984973 remove type_conversion function 2017-12-27 14:08:34 +08:00
Dong Yan
a1f6044cba rewrite selection function of ActionNode for clarity, add and delete some notes 2017-12-27 11:43:04 +08:00
Dong Yan
7f0565a5f6 variable rename and delete redundant code 2017-12-26 22:19:10 +08:00
sproblvem
2b24f0760e Merge branch 'master' into mcts_virtual_loss 2017-12-24 21:27:54 +08:00
Dong Yan
89226b449a replace try catch by isinstance collections.Hashable 2017-12-24 20:57:53 +08:00
Dong Yan
f0074aa7ca fix bug of game config and add profing functions to mcts 2017-12-24 17:43:45 +08:00
mcgrady00h
5aa5dcd191 add comments for mcts with virtual loss 2017-12-24 16:47:43 +08:00
mcgrady00h
8c6f44a015 Merge remote-tracking branch 'origin' into mcts_virtual_loss 2017-12-24 15:49:45 +08:00
mcgrady00h
941284e7b1 Merge remote-tracking branch 'origin' into mcts_virtual_loss 2017-12-24 15:44:30 +08:00
rtz19970824
74504ceb1d debug for go and reversi 2017-12-24 14:40:50 +08:00
Dong Yan
426251e158 add some code for debug and profiling 2017-12-24 01:07:46 +08:00
haoshengzou
b2b2d01d9c Merge remote-tracking branch 'origin/master' 2017-12-23 17:25:37 +08:00
haoshengzou
b21a55dc88 towards policy/value refactor 2017-12-23 17:25:16 +08:00
rtz19970824
3f238864fb minor fixed for mcts, check finish for go 2017-12-23 15:58:06 +08:00
haoshengzou
8c13d8ebe6 Merge remote-tracking branch 'origin/master' 2017-12-23 15:36:44 +08:00
haoshengzou
04048b7873 fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00
Dong Yan
b2ef770415 connect reversi with game 2017-12-23 13:05:25 +08:00
mcgrady00h
3b534064bd fix virtual loss bug 2017-12-23 02:48:53 +08:00
Haosheng Zou
8ba16a8808 Merge remote-tracking branch 'origin/master' 2017-12-22 00:24:06 +08:00
Haosheng Zou
1cc5063007 add value_function (critic). value_function and policy not finished yet. 2017-12-22 00:22:23 +08:00
Wenbo Hu
ced63af18f fixing bug pass parameterg 2017-12-21 19:31:51 +08:00
Wenbo Hu
f0d59dab6c forbid pass, if we have other choices 2017-12-20 22:10:47 +08:00
Wenbo Hu
e2c6b96e57 minor revision. 2017-12-20 21:52:30 +08:00
Wenbo Hu
48e95a21ea simulator process a valid set, instead of a single action 2017-12-20 21:35:35 +08:00
rtz19970824
7fca90c61b modify the mcts, refactor the network 2017-12-20 16:43:42 +08:00
Dong Yan
232204d797 fix the copy bug in check_global_isomorphous; refactor code to eliminate side effect 2017-12-19 22:57:38 +08:00
mcgrady00h
1f011a44ef add mcts virtual loss version (may have bugs) 2017-12-19 17:04:55 +08:00
Dong Yan
fc8114fe35 merge flatten and deflatten, rename variable for clarity 2017-12-19 16:51:50 +08:00
宋世虹
7693c38f44 add comments and todos 2017-12-17 13:28:21 +08:00
宋世虹
62e2c6582d finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit 2017-12-17 12:52:00 +08:00
Dong Yan
e10acf5130 0. code refactor, try to merge Go and GoEnv 2017-12-16 23:29:11 +08:00
Dong Yan
6cb4b02fca merge class strategy with class game. Next, merge Go with GoEnv 2017-12-15 22:19:44 +08:00
rtz19970824
0874d5342f implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00
Haosheng Zou
f496725437 add dqn.py to write 2017-12-13 22:43:45 +08:00
Haosheng Zou
72ae304ab3 preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00