351 Commits

Author SHA1 Message Date
haoshengzou
2a2274aeea initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00
haoshengzou
54a7b1343d design exploration and evaluators for off-policy algos 2018-03-04 13:53:29 +08:00
Dong Yan
2eb056a721 Merge branch 'master' of github.com:sproblvem/tianshou 2018-03-03 21:30:15 +08:00
Dong Yan
0cf2fd6c53 an initial version of untested replaymemory qreturn 2018-03-03 21:25:29 +08:00
haoshengzou
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. 2018-03-03 20:42:34 +08:00
Dong Yan
528c4be93c add render option for ddpg 2018-02-28 18:44:06 +08:00
haoshengzou
5ab2fa3b65 minor fixes 2018-02-27 14:46:02 +08:00
haoshengzou
675057c6b9 interfaces for advantage_estimation. full_return finished and tested. 2018-02-27 14:11:52 +08:00
songshshshsh
25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-02-27 13:15:36 +08:00
songshshshsh
67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing 2018-02-27 13:13:38 +08:00
haoshengzou
40190a282e Merge remote-tracking branch 'origin/master'
# Conflicts:
#	README.md
2018-02-26 11:48:46 +08:00
haoshengzou
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. 2018-02-26 11:47:02 +08:00
Dong Yan
0bc1b63e38 add epsilon-greedy for dqn 2018-02-25 16:31:35 +08:00
rtz19970824
a40e5aec54 modified README 2018-02-24 16:26:19 +08:00
Dong Yan
f3aee448e0 add option to show the running result of cartpole 2018-02-24 10:53:39 +08:00
Dong Yan
764f7fb5f1 minor fix of play.py 2018-02-23 23:15:04 +08:00
sproblvem
a0849fa213
Merge pull request #5 from sproblvem/union_set
add union set for do_move and is_valid
The modify on play.py should be removed, I will fix it on latter commit
2018-02-23 15:01:17 +08:00
sproblvem
7711686bc6
Update README.md
add the dependency
2018-02-12 15:28:25 +08:00
Dong Yan
2163d18728 fix the env -> self._env bug 2018-02-10 03:42:00 +08:00
Dong Yan
50b2d98d0a support ctrl-c to terminate play.py 2018-02-08 21:17:56 +08:00
haoshengzou
e6d477f9a3 modified top-level .gitignore to include tianshou/data 2018-01-25 16:08:04 +08:00
haoshengzou
b8568c6af4 added data/utils.py. was ignored by .gitignore before... 2018-01-25 10:15:38 +08:00
haoshengzou
5910e08672 data/utils.py added but not pushed... 2018-01-25 10:11:36 +08:00
haoshengzou
f32e1d9c12 finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet. 2018-01-18 17:38:52 +08:00
haoshengzou
8fbde8283f finish dqn example. advantage estimation module is not complete yet. 2018-01-18 12:19:48 +08:00
Wenbo
0131bcdc85 fix minor 2018-01-17 15:57:41 +08:00
Wenbo
0e4aa44ebb add deepcopy for hash, add some testing 2018-01-17 15:54:46 +08:00
haoshengzou
9f96cc2461 finish design and running of ppo and actor-critic. advantage estimation module is not complete yet. 2018-01-17 14:21:50 +08:00
haoshengzou
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-17 11:55:51 +08:00
Wenbo Hu
e76ccaee80 add union set for do_move and is_valid 2018-01-16 14:10:56 +08:00
haoshengzou
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-15 16:32:30 +08:00
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
rtz19970824
3b222f5edb add an args to intrigue training 2018-01-13 15:59:57 +08:00
rtz19970824
2e8662889f add multi-thread for end-to-end training 2018-01-13 15:57:41 +08:00
rtz19970824
fcaa571b42 add the interface in engine.py 2018-01-12 21:48:01 +08:00
Dong Yan
68cc63144f fix the hash conflict bug 2018-01-12 21:08:07 +08:00
rtz19970824
90ffdcbb1f check the latest checkpoint while self play 2018-01-12 19:16:44 +08:00
rtz19970824
c217aa165d add some error message for better debugging 2018-01-12 17:17:03 +08:00
Dong Yan
e58df65301 fix the async bug between think and do move checking, which introduced by bobo 2018-01-11 21:00:32 +08:00
Dong Yan
afc55ed9c2 refactor code to avoid memory leak 2018-01-11 17:02:36 +08:00
sproblvem
284cc64c18
Merge pull request #3 from sproblvem/double-network
Double network
2018-01-11 10:55:12 +08:00
Dong Yan
5482815de6 replace two isolated player process by two different set of variables in the tf graph 2018-01-10 23:27:17 +08:00
Dong Yan
f425085e0a fix the tf assign error of copy the trained variable from black to white 2018-01-09 21:16:35 +08:00
rtz19970824
c2775df8e6 modify game.py for multi-player 2018-01-09 20:09:48 +08:00
rtz19970824
eb0ce95919 modify model.py for multi-player 2018-01-09 19:50:37 +08:00
Tongzheng Ren
891c5b1e47 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-01-08 21:21:08 +08:00
Tongzheng Ren
f2edc4896e modify play.py for avoiding potential bug 2018-01-08 21:19:17 +08:00
rtz19970824
32b7b33ed5 debug: we should estimate our own win rate 2018-01-08 16:19:59 +08:00
JialianLee
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value 2018-01-05 17:02:19 +08:00