323 Commits

Author SHA1 Message Date
haoshengzou
9f96cc2461 finish design and running of ppo and actor-critic. advantage estimation module is not complete yet. 2018-01-17 14:21:50 +08:00
haoshengzou
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-17 11:55:51 +08:00
haoshengzou
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-15 16:32:30 +08:00
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
rtz19970824
3b222f5edb add an args to intrigue training 2018-01-13 15:59:57 +08:00
rtz19970824
2e8662889f add multi-thread for end-to-end training 2018-01-13 15:57:41 +08:00
rtz19970824
fcaa571b42 add the interface in engine.py 2018-01-12 21:48:01 +08:00
Dong Yan
68cc63144f fix the hash conflict bug 2018-01-12 21:08:07 +08:00
rtz19970824
90ffdcbb1f check the latest checkpoint while self play 2018-01-12 19:16:44 +08:00
rtz19970824
c217aa165d add some error message for better debugging 2018-01-12 17:17:03 +08:00
Dong Yan
e58df65301 fix the async bug between think and do move checking, which introduced by bobo 2018-01-11 21:00:32 +08:00
Dong Yan
afc55ed9c2 refactor code to avoid memory leak 2018-01-11 17:02:36 +08:00
sproblvem
284cc64c18
Merge pull request #3 from sproblvem/double-network
Double network
2018-01-11 10:55:12 +08:00
Dong Yan
5482815de6 replace two isolated player process by two different set of variables in the tf graph 2018-01-10 23:27:17 +08:00
Dong Yan
f425085e0a fix the tf assign error of copy the trained variable from black to white 2018-01-09 21:16:35 +08:00
rtz19970824
c2775df8e6 modify game.py for multi-player 2018-01-09 20:09:48 +08:00
rtz19970824
eb0ce95919 modify model.py for multi-player 2018-01-09 19:50:37 +08:00
Tongzheng Ren
891c5b1e47 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-01-08 21:21:08 +08:00
Tongzheng Ren
f2edc4896e modify play.py for avoiding potential bug 2018-01-08 21:19:17 +08:00
rtz19970824
32b7b33ed5 debug: we should estimate our own win rate 2018-01-08 16:19:59 +08:00
JialianLee
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value 2018-01-05 17:02:19 +08:00
haoshengzou
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research 2018-01-03 20:32:05 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
haoshengzou
88648f0c4b Merge branch 'master' of https://github.com/sproblvem/tianshou 2017-12-31 15:56:19 +08:00
JialianLee
5849776c9a Modification and doc for unit test 2017-12-29 13:45:53 +08:00
rtz19970824
01f39f40d3 debug for unit test 2017-12-28 19:38:25 +08:00
Wenbo Hu
50e8ea36e8 merge 2017-12-29 03:31:57 +08:00
Wenbo Hu
63a0d32b34 use hash table for check_global_isomorphous 2017-12-29 03:30:09 +08:00
Wenbo Hu
da156ed88e Merge branch 'master' of github.com:sproblvem/tianshou 2017-12-29 03:19:46 +08:00
Wenbo Hu
76ac579056 Merge branch 'master' of github.com:sproblvem/tianshou 2017-12-29 01:05:14 +08:00
rtz19970824
2dfab68efe debug for unit test 2017-12-28 19:28:21 +08:00
JialianLee
4140d8c9d2 Modification on unit test 2017-12-28 17:10:25 +08:00
JialianLee
0352866b1a Modification for game engine 2017-12-28 16:27:28 +08:00
JialianLee
5457e5134e add a unit test 2017-12-28 16:20:44 +08:00
rtz19970824
b699258e76 debug for reversi 2017-12-28 15:55:07 +08:00
Dong Yan
08b6649fea test next_action.next_state in MCTS 2017-12-28 15:52:31 +08:00
Dong Yan
47676993fd solve the performance bottleneck by only hashing the last board 2017-12-28 01:16:24 +08:00
Dong Yan
affd0319e2 rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action 2017-12-27 21:11:40 +08:00
Dong Yan
d48982d59e move evaluator from action node to mcts 2017-12-27 20:49:54 +08:00
rtz19970824
0a160065aa Merge branch 'master' of https://github.com/sproblvem/tianshou 2017-12-27 19:54:52 +08:00
rtz19970824
f2291efc72 check exists when save data 2017-12-27 19:54:36 +08:00
JialianLee
8d102d249f Modification for backpropagation process 2017-12-27 18:55:00 +08:00
Dong Yan
9f60984973 remove type_conversion function 2017-12-27 14:08:34 +08:00
Dong Yan
a1f6044cba rewrite selection function of ActionNode for clarity, add and delete some notes 2017-12-27 11:43:04 +08:00
Dong Yan
c788b253fb show the stdout of player.py for debugging 2017-12-27 01:04:09 +08:00
Dong Yan
7f0565a5f6 variable rename and delete redundant code 2017-12-26 22:19:10 +08:00
Dong Yan
0c3ff3bf37 delete unused code 2017-12-26 19:29:35 +08:00
Dong Yan
029ab199f4 add softmax for mcts root node 2017-12-26 16:47:24 +08:00
Dong Yan
8f508c790b add role for mcts debug 2017-12-26 15:07:15 +08:00