Tianshou

hongshaorou/Tianshou

Fork 0

Commit Graph

Select branches

Hide Pull Requests

feature/algo-eval

master

priv

v0.2.1

v0.2.2

v0.2.3

v0.2.4

v0.2.4.post1

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.0.post1

v0.3.0rc0

v0.3.1

v0.3.2

v0.4.0

v0.4.1

v0.4.10

v0.4.11

v0.4.2

v0.4.3

v0.4.4

v0.4.5

v0.4.6

v0.4.6.post1

v0.4.7

v0.4.8

v0.4.9

v0.5.0

v1.0.0

40190a282e Merge remote-tracking branch 'origin/master' haoshengzou 2018-02-26 11:48:46 +08:00
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. haoshengzou 2018-02-26 11:47:02 +08:00
0bc1b63e38 add epsilon-greedy for dqn Dong Yan 2018-02-25 16:31:35 +08:00
a40e5aec54 modified README rtz19970824 2018-02-24 16:26:19 +08:00
f3aee448e0 add option to show the running result of cartpole Dong Yan 2018-02-24 10:53:39 +08:00
764f7fb5f1 minor fix of play.py Dong Yan 2018-02-23 23:15:04 +08:00
a0849fa213

Merge pull request #5 from sproblvem/union_set sproblvem 2018-02-23 15:01:17 +08:00
7711686bc6

Update README.md sproblvem 2018-02-12 15:28:25 +08:00
2163d18728 fix the env -> self._env bug Dong Yan 2018-02-10 03:42:00 +08:00
50b2d98d0a support ctrl-c to terminate play.py Dong Yan 2018-02-08 21:17:56 +08:00
e6d477f9a3 modified top-level .gitignore to include tianshou/data haoshengzou 2018-01-25 16:08:04 +08:00
b8568c6af4 added data/utils.py. was ignored by .gitignore before... haoshengzou 2018-01-25 10:15:38 +08:00
5910e08672 data/utils.py added but not pushed... haoshengzou 2018-01-25 10:11:36 +08:00
f32e1d9c12 finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet. haoshengzou 2018-01-18 17:38:52 +08:00
8fbde8283f finish dqn example. advantage estimation module is not complete yet. haoshengzou 2018-01-18 12:19:48 +08:00
0131bcdc85 fix minor Wenbo 2018-01-17 15:57:41 +08:00
0e4aa44ebb add deepcopy for hash, add some testing Wenbo 2018-01-17 15:54:46 +08:00
9f96cc2461 finish design and running of ppo and actor-critic. advantage estimation module is not complete yet. haoshengzou 2018-01-17 14:21:50 +08:00
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. haoshengzou 2018-01-17 11:55:51 +08:00
e76ccaee80 add union set for do_move and is_valid Wenbo Hu 2018-01-16 14:10:56 +08:00
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. haoshengzou 2018-01-15 16:32:30 +08:00
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. haoshengzou 2018-01-15 00:03:06 +08:00
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. haoshengzou 2018-01-14 20:58:28 +08:00
3b222f5edb add an args to intrigue training rtz19970824 2018-01-13 15:59:57 +08:00
2e8662889f add multi-thread for end-to-end training rtz19970824 2018-01-13 15:57:41 +08:00
fcaa571b42 add the interface in engine.py rtz19970824 2018-01-12 21:48:01 +08:00
68cc63144f fix the hash conflict bug Dong Yan 2018-01-12 21:08:07 +08:00
90ffdcbb1f check the latest checkpoint while self play rtz19970824 2018-01-12 19:16:44 +08:00
c217aa165d add some error message for better debugging rtz19970824 2018-01-12 17:17:03 +08:00
e58df65301 fix the async bug between think and do move checking, which introduced by bobo Dong Yan 2018-01-11 21:00:32 +08:00
afc55ed9c2 refactor code to avoid memory leak Dong Yan 2018-01-11 17:02:36 +08:00
284cc64c18

Merge pull request #3 from sproblvem/double-network sproblvem 2018-01-11 10:55:12 +08:00
5482815de6 replace two isolated player process by two different set of variables in the tf graph Dong Yan 2018-01-10 23:27:17 +08:00
f425085e0a fix the tf assign error of copy the trained variable from black to white Dong Yan 2018-01-09 21:16:35 +08:00
c2775df8e6 modify game.py for multi-player rtz19970824 2018-01-09 20:09:48 +08:00
eb0ce95919 modify model.py for multi-player rtz19970824 2018-01-09 19:50:37 +08:00
891c5b1e47 Merge branch 'master' of https://github.com/sproblvem/tianshou Tongzheng Ren 2018-01-08 21:21:08 +08:00
f2edc4896e modify play.py for avoiding potential bug Tongzheng Ren 2018-01-08 21:19:17 +08:00
32b7b33ed5 debug: we should estimate our own win rate rtz19970824 2018-01-08 16:19:59 +08:00
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value JialianLee 2018-01-05 17:02:19 +08:00
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research haoshengzou 2018-01-03 20:32:05 +08:00
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper haoshengzou 2018-01-02 19:40:37 +08:00
88648f0c4b Merge branch 'master' of https://github.com/sproblvem/tianshou haoshengzou 2017-12-31 15:56:19 +08:00
5849776c9a Modification and doc for unit test JialianLee 2017-12-29 13:45:53 +08:00
01f39f40d3 debug for unit test rtz19970824 2017-12-28 19:38:25 +08:00
50e8ea36e8 merge Wenbo Hu 2017-12-29 03:31:57 +08:00
63a0d32b34 use hash table for check_global_isomorphous Wenbo Hu 2017-12-29 03:30:09 +08:00
da156ed88e Merge branch 'master' of github.com:sproblvem/tianshou Wenbo Hu 2017-12-29 03:19:46 +08:00
76ac579056 Merge branch 'master' of github.com:sproblvem/tianshou Wenbo Hu 2017-12-29 01:05:14 +08:00
2dfab68efe debug for unit test rtz19970824 2017-12-28 19:28:21 +08:00
4140d8c9d2 Modification on unit test JialianLee 2017-12-28 17:10:25 +08:00
0352866b1a Modification for game engine JialianLee 2017-12-28 16:27:28 +08:00
5457e5134e add a unit test JialianLee 2017-12-28 16:20:44 +08:00
b699258e76 debug for reversi rtz19970824 2017-12-28 15:55:07 +08:00
08b6649fea test next_action.next_state in MCTS Dong Yan 2017-12-28 15:52:31 +08:00
47676993fd solve the performance bottleneck by only hashing the last board Dong Yan 2017-12-28 01:16:24 +08:00
affd0319e2 rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action Dong Yan 2017-12-27 21:11:40 +08:00
d48982d59e move evaluator from action node to mcts Dong Yan 2017-12-27 20:49:54 +08:00
0a160065aa Merge branch 'master' of https://github.com/sproblvem/tianshou rtz19970824 2017-12-27 19:54:52 +08:00
f2291efc72 check exists when save data rtz19970824 2017-12-27 19:54:36 +08:00
8d102d249f Modification for backpropagation process JialianLee 2017-12-27 18:55:00 +08:00
9f60984973 remove type_conversion function Dong Yan 2017-12-27 14:08:34 +08:00
a1f6044cba rewrite selection function of ActionNode for clarity, add and delete some notes Dong Yan 2017-12-27 11:43:04 +08:00
c788b253fb show the stdout of player.py for debugging Dong Yan 2017-12-27 01:04:09 +08:00
7f0565a5f6 variable rename and delete redundant code Dong Yan 2017-12-26 22:19:10 +08:00
0c3ff3bf37 delete unused code Dong Yan 2017-12-26 19:29:35 +08:00
029ab199f4 add softmax for mcts root node Dong Yan 2017-12-26 16:47:24 +08:00
8f508c790b add role for mcts debug Dong Yan 2017-12-26 15:07:15 +08:00
aa6b5434c6 add debuf info for mcts and add softmax for the prior Dong Yan 2017-12-26 14:46:14 +08:00
725fc2c04e pass the checkpoint path to the model rtz19970824 2017-12-26 13:17:46 +08:00
76f641a0f1 minor fixed rtz19970824 2017-12-25 16:51:44 +08:00
76f6a0c470 merge conflict rtz19970824 2017-12-25 16:42:08 +08:00
4379f4c0fd modify play.py for better experience rtz19970824 2017-12-25 16:40:38 +08:00
fcb160dff6 fix python 2,3 print format error Dong Yan 2017-12-25 16:35:43 +08:00
64da200e5d move , from inside of () to outside of () Dong Yan 2017-12-25 16:26:51 +08:00
4362d76432 Merge branch 'master' of https://github.com/sproblvem/tianshou mcgrady00h 2017-12-25 15:33:48 +08:00
0fdbaef1a1 add '()' to support python3 mcgrady00h 2017-12-25 15:33:17 +08:00
70824a3612 remove historical file data.py rtz19970824 2017-12-25 15:09:26 +08:00
9583a14856 Merge pull request #2 from sproblvem/mcts_virtual_loss sproblvem 2017-12-24 21:29:13 +08:00
e8ac38c79e Merge branch 'master' into mcts_virtual_loss sproblvem 2017-12-24 21:28:50 +08:00
2b24f0760e Merge branch 'master' into mcts_virtual_loss sproblvem 2017-12-24 21:27:54 +08:00
89226b449a replace try catch by isinstance collections.Hashable Dong Yan 2017-12-24 20:57:53 +08:00
f0074aa7ca fix bug of game config and add profing functions to mcts Dong Yan 2017-12-24 17:43:45 +08:00
5aa5dcd191 add comments for mcts with virtual loss mcgrady00h 2017-12-24 16:47:43 +08:00
8c6f44a015 Merge remote-tracking branch 'origin' into mcts_virtual_loss mcgrady00h 2017-12-24 15:49:45 +08:00
cf57144ce9 merge master mcgrady00h 2017-12-24 15:47:11 +08:00
941284e7b1 Merge remote-tracking branch 'origin' into mcts_virtual_loss mcgrady00h 2017-12-24 15:44:30 +08:00
2d9aa32758 change all copy to deepcopy rtz19970824 2017-12-24 14:41:40 +08:00
77e8aa3c28 Merge branch 'master' of https://github.com/sproblvem/tianshou rtz19970824 2017-12-24 14:40:57 +08:00
74504ceb1d debug for go and reversi rtz19970824 2017-12-24 14:40:50 +08:00
001263a683 use a simplified version of get_score Wenbo Hu 2017-12-24 12:07:56 +08:00
426251e158 add some code for debug and profiling Dong Yan 2017-12-24 01:07:46 +08:00
162aa313b6 A new version of reversi JialianLee 2017-12-24 00:42:59 +08:00
dcf293d637 count the winning rate for each player Dong Yan 2017-12-23 22:05:34 +08:00
8780417378 Merge branch 'master' of github.com:sproblvem/tianshou Dong Yan 2017-12-23 17:43:47 +08:00
919784e88b bug fix of model.py Dong Yan 2017-12-23 17:43:33 +08:00
238039b854 Merge remote-tracking branch 'origin/master' haoshengzou 2017-12-23 17:25:37 +08:00
b2b2d01d9c Merge remote-tracking branch 'origin/master' haoshengzou 2017-12-23 17:25:37 +08:00
b33a141373 towards policy/value refactor haoshengzou 2017-12-23 17:25:16 +08:00
b21a55dc88 towards policy/value refactor haoshengzou 2017-12-23 17:25:16 +08:00