332 Commits

Author SHA1 Message Date
haoshengzou
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. 2018-02-26 11:47:02 +08:00
sproblvem
7711686bc6
Update README.md
add the dependency
2018-02-12 15:28:25 +08:00
Dong Yan
2163d18728 fix the env -> self._env bug 2018-02-10 03:42:00 +08:00
Dong Yan
50b2d98d0a support ctrl-c to terminate play.py 2018-02-08 21:17:56 +08:00
haoshengzou
e6d477f9a3 modified top-level .gitignore to include tianshou/data 2018-01-25 16:08:04 +08:00
haoshengzou
b8568c6af4 added data/utils.py. was ignored by .gitignore before... 2018-01-25 10:15:38 +08:00
haoshengzou
5910e08672 data/utils.py added but not pushed... 2018-01-25 10:11:36 +08:00
haoshengzou
f32e1d9c12 finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet. 2018-01-18 17:38:52 +08:00
haoshengzou
8fbde8283f finish dqn example. advantage estimation module is not complete yet. 2018-01-18 12:19:48 +08:00
haoshengzou
9f96cc2461 finish design and running of ppo and actor-critic. advantage estimation module is not complete yet. 2018-01-17 14:21:50 +08:00
haoshengzou
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-17 11:55:51 +08:00
haoshengzou
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-15 16:32:30 +08:00
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
rtz19970824
3b222f5edb add an args to intrigue training 2018-01-13 15:59:57 +08:00
rtz19970824
2e8662889f add multi-thread for end-to-end training 2018-01-13 15:57:41 +08:00
rtz19970824
fcaa571b42 add the interface in engine.py 2018-01-12 21:48:01 +08:00
Dong Yan
68cc63144f fix the hash conflict bug 2018-01-12 21:08:07 +08:00
rtz19970824
90ffdcbb1f check the latest checkpoint while self play 2018-01-12 19:16:44 +08:00
rtz19970824
c217aa165d add some error message for better debugging 2018-01-12 17:17:03 +08:00
Dong Yan
e58df65301 fix the async bug between think and do move checking, which introduced by bobo 2018-01-11 21:00:32 +08:00
Dong Yan
afc55ed9c2 refactor code to avoid memory leak 2018-01-11 17:02:36 +08:00
sproblvem
284cc64c18
Merge pull request #3 from sproblvem/double-network
Double network
2018-01-11 10:55:12 +08:00
Dong Yan
5482815de6 replace two isolated player process by two different set of variables in the tf graph 2018-01-10 23:27:17 +08:00
Dong Yan
f425085e0a fix the tf assign error of copy the trained variable from black to white 2018-01-09 21:16:35 +08:00
rtz19970824
c2775df8e6 modify game.py for multi-player 2018-01-09 20:09:48 +08:00
rtz19970824
eb0ce95919 modify model.py for multi-player 2018-01-09 19:50:37 +08:00
Tongzheng Ren
891c5b1e47 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-01-08 21:21:08 +08:00
Tongzheng Ren
f2edc4896e modify play.py for avoiding potential bug 2018-01-08 21:19:17 +08:00
rtz19970824
32b7b33ed5 debug: we should estimate our own win rate 2018-01-08 16:19:59 +08:00
JialianLee
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value 2018-01-05 17:02:19 +08:00
haoshengzou
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research 2018-01-03 20:32:05 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
haoshengzou
88648f0c4b Merge branch 'master' of https://github.com/sproblvem/tianshou 2017-12-31 15:56:19 +08:00
JialianLee
5849776c9a Modification and doc for unit test 2017-12-29 13:45:53 +08:00
rtz19970824
01f39f40d3 debug for unit test 2017-12-28 19:38:25 +08:00
Wenbo Hu
50e8ea36e8 merge 2017-12-29 03:31:57 +08:00
Wenbo Hu
63a0d32b34 use hash table for check_global_isomorphous 2017-12-29 03:30:09 +08:00
Wenbo Hu
da156ed88e Merge branch 'master' of github.com:sproblvem/tianshou 2017-12-29 03:19:46 +08:00
Wenbo Hu
76ac579056 Merge branch 'master' of github.com:sproblvem/tianshou 2017-12-29 01:05:14 +08:00
rtz19970824
2dfab68efe debug for unit test 2017-12-28 19:28:21 +08:00
JialianLee
4140d8c9d2 Modification on unit test 2017-12-28 17:10:25 +08:00
JialianLee
0352866b1a Modification for game engine 2017-12-28 16:27:28 +08:00
JialianLee
5457e5134e add a unit test 2017-12-28 16:20:44 +08:00
rtz19970824
b699258e76 debug for reversi 2017-12-28 15:55:07 +08:00
Dong Yan
08b6649fea test next_action.next_state in MCTS 2017-12-28 15:52:31 +08:00
Dong Yan
47676993fd solve the performance bottleneck by only hashing the last board 2017-12-28 01:16:24 +08:00
Dong Yan
affd0319e2 rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action 2017-12-27 21:11:40 +08:00
Dong Yan
d48982d59e move evaluator from action node to mcts 2017-12-27 20:49:54 +08:00
rtz19970824
0a160065aa Merge branch 'master' of https://github.com/sproblvem/tianshou 2017-12-27 19:54:52 +08:00