2163d18728fix the env -> self._env bug
Dong Yan
2018-02-10 03:42:00 +08:00
50b2d98d0asupport ctrl-c to terminate play.py
Dong Yan
2018-02-08 21:17:56 +08:00
e6d477f9a3modified top-level .gitignore to include tianshou/data
haoshengzou
2018-01-25 16:08:04 +08:00
b8568c6af4added data/utils.py. was ignored by .gitignore before...
haoshengzou
2018-01-25 10:15:38 +08:00
5910e08672data/utils.py added but not pushed...
haoshengzou
2018-01-25 10:11:36 +08:00
f32e1d9c12finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
haoshengzou
2018-01-18 17:38:52 +08:00
8fbde8283ffinish dqn example. advantage estimation module is not complete yet.
haoshengzou
2018-01-18 12:19:48 +08:00
0131bcdc85fix minor
Wenbo
2018-01-17 15:57:41 +08:00
0e4aa44ebbadd deepcopy for hash, add some testing
Wenbo
2018-01-17 15:54:46 +08:00
9f96cc2461finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.
haoshengzou
2018-01-17 14:21:50 +08:00
ed25bf7586fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
haoshengzou
2018-01-17 11:55:51 +08:00
e76ccaee80add union set for do_move and is_valid
Wenbo Hu
2018-01-16 14:10:56 +08:00
d599506dc9fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
haoshengzou
2018-01-15 16:32:30 +08:00
983cd36074finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
haoshengzou
2018-01-15 00:03:06 +08:00
fed3bf2a12auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
haoshengzou
2018-01-14 20:58:28 +08:00
3b222f5edbadd an args to intrigue training
rtz19970824
2018-01-13 15:59:57 +08:00
2e8662889fadd multi-thread for end-to-end training
rtz19970824
2018-01-13 15:57:41 +08:00
fcaa571b42add the interface in engine.py
rtz19970824
2018-01-12 21:48:01 +08:00
68cc63144ffix the hash conflict bug
Dong Yan
2018-01-12 21:08:07 +08:00
90ffdcbb1fcheck the latest checkpoint while self play
rtz19970824
2018-01-12 19:16:44 +08:00
c217aa165dadd some error message for better debugging
rtz19970824
2018-01-12 17:17:03 +08:00
e58df65301fix the async bug between think and do move checking, which introduced by bobo
Dong Yan
2018-01-11 21:00:32 +08:00
afc55ed9c2refactor code to avoid memory leak
Dong Yan
2018-01-11 17:02:36 +08:00
32b7b33ed5debug: we should estimate our own win rate
rtz19970824
2018-01-08 16:19:59 +08:00
8b7b4b6c6bAdd dirichlet noise to root prior and add uniform noise to initial Q value
JialianLee
2018-01-05 17:02:19 +08:00
dfcea74fcffix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
haoshengzou
2018-01-03 20:32:05 +08:00
4333ee5d39ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
haoshengzou
2018-01-02 19:40:37 +08:00
5849776c9aModification and doc for unit test
JialianLee
2017-12-29 13:45:53 +08:00
01f39f40d3debug for unit test
rtz19970824
2017-12-28 19:38:25 +08:00
50e8ea36e8merge
Wenbo Hu
2017-12-29 03:31:57 +08:00
63a0d32b34use hash table for check_global_isomorphous
Wenbo Hu
2017-12-29 03:30:09 +08:00
da156ed88eMerge branch 'master' of github.com:sproblvem/tianshou
Wenbo Hu
2017-12-29 03:19:46 +08:00
76ac579056Merge branch 'master' of github.com:sproblvem/tianshou
Wenbo Hu
2017-12-29 01:05:14 +08:00
2dfab68efedebug for unit test
rtz19970824
2017-12-28 19:28:21 +08:00
4140d8c9d2Modification on unit test
JialianLee
2017-12-28 17:10:25 +08:00
0352866b1aModification for game engine
JialianLee
2017-12-28 16:27:28 +08:00
5457e5134eadd a unit test
JialianLee
2017-12-28 16:20:44 +08:00
b699258e76debug for reversi
rtz19970824
2017-12-28 15:55:07 +08:00
08b6649featest next_action.next_state in MCTS
Dong Yan
2017-12-28 15:52:31 +08:00
47676993fdsolve the performance bottleneck by only hashing the last board
Dong Yan
2017-12-28 01:16:24 +08:00
affd0319e2rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action
Dong Yan
2017-12-27 21:11:40 +08:00
d48982d59emove evaluator from action node to mcts
Dong Yan
2017-12-27 20:49:54 +08:00