songshshshsh
|
25b25ce7d8
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-02-27 13:15:36 +08:00 |
|
songshshshsh
|
67d0e78ab9
|
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
|
2018-02-27 13:13:38 +08:00 |
|
haoshengzou
|
40190a282e
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# README.md
|
2018-02-26 11:48:46 +08:00 |
|
haoshengzou
|
87889d766c
|
minor fixes. proceed to refactor replay to use lists as in batch.
|
2018-02-26 11:47:02 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
rtz19970824
|
a40e5aec54
|
modified README
|
2018-02-24 16:26:19 +08:00 |
|
Dong Yan
|
f3aee448e0
|
add option to show the running result of cartpole
|
2018-02-24 10:53:39 +08:00 |
|
Dong Yan
|
764f7fb5f1
|
minor fix of play.py
|
2018-02-23 23:15:04 +08:00 |
|
sproblvem
|
a0849fa213
|
Merge pull request #5 from sproblvem/union_set
add union set for do_move and is_valid
The modify on play.py should be removed, I will fix it on latter commit
|
2018-02-23 15:01:17 +08:00 |
|
sproblvem
|
7711686bc6
|
Update README.md
add the dependency
|
2018-02-12 15:28:25 +08:00 |
|
Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
Dong Yan
|
50b2d98d0a
|
support ctrl-c to terminate play.py
|
2018-02-08 21:17:56 +08:00 |
|
haoshengzou
|
e6d477f9a3
|
modified top-level .gitignore to include tianshou/data
|
2018-01-25 16:08:04 +08:00 |
|
haoshengzou
|
b8568c6af4
|
added data/utils.py. was ignored by .gitignore before...
|
2018-01-25 10:15:38 +08:00 |
|
haoshengzou
|
5910e08672
|
data/utils.py added but not pushed...
|
2018-01-25 10:11:36 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
Wenbo
|
0131bcdc85
|
fix minor
|
2018-01-17 15:57:41 +08:00 |
|
Wenbo
|
0e4aa44ebb
|
add deepcopy for hash, add some testing
|
2018-01-17 15:54:46 +08:00 |
|
haoshengzou
|
9f96cc2461
|
finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.
|
2018-01-17 14:21:50 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
Wenbo Hu
|
e76ccaee80
|
add union set for do_move and is_valid
|
2018-01-16 14:10:56 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
rtz19970824
|
3b222f5edb
|
add an args to intrigue training
|
2018-01-13 15:59:57 +08:00 |
|
rtz19970824
|
2e8662889f
|
add multi-thread for end-to-end training
|
2018-01-13 15:57:41 +08:00 |
|
rtz19970824
|
fcaa571b42
|
add the interface in engine.py
|
2018-01-12 21:48:01 +08:00 |
|
Dong Yan
|
68cc63144f
|
fix the hash conflict bug
|
2018-01-12 21:08:07 +08:00 |
|
rtz19970824
|
90ffdcbb1f
|
check the latest checkpoint while self play
|
2018-01-12 19:16:44 +08:00 |
|
rtz19970824
|
c217aa165d
|
add some error message for better debugging
|
2018-01-12 17:17:03 +08:00 |
|
Dong Yan
|
e58df65301
|
fix the async bug between think and do move checking, which introduced by bobo
|
2018-01-11 21:00:32 +08:00 |
|
Dong Yan
|
afc55ed9c2
|
refactor code to avoid memory leak
|
2018-01-11 17:02:36 +08:00 |
|
sproblvem
|
284cc64c18
|
Merge pull request #3 from sproblvem/double-network
Double network
|
2018-01-11 10:55:12 +08:00 |
|
Dong Yan
|
5482815de6
|
replace two isolated player process by two different set of variables in the tf graph
|
2018-01-10 23:27:17 +08:00 |
|
Dong Yan
|
f425085e0a
|
fix the tf assign error of copy the trained variable from black to white
|
2018-01-09 21:16:35 +08:00 |
|
rtz19970824
|
c2775df8e6
|
modify game.py for multi-player
|
2018-01-09 20:09:48 +08:00 |
|
rtz19970824
|
eb0ce95919
|
modify model.py for multi-player
|
2018-01-09 19:50:37 +08:00 |
|
Tongzheng Ren
|
891c5b1e47
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-01-08 21:21:08 +08:00 |
|
Tongzheng Ren
|
f2edc4896e
|
modify play.py for avoiding potential bug
|
2018-01-08 21:19:17 +08:00 |
|
rtz19970824
|
32b7b33ed5
|
debug: we should estimate our own win rate
|
2018-01-08 16:19:59 +08:00 |
|
JialianLee
|
8b7b4b6c6b
|
Add dirichlet noise to root prior and add uniform noise to initial Q value
|
2018-01-05 17:02:19 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
4333ee5d39
|
ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
|
2018-01-02 19:40:37 +08:00 |
|
haoshengzou
|
88648f0c4b
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2017-12-31 15:56:19 +08:00 |
|
JialianLee
|
5849776c9a
|
Modification and doc for unit test
|
2017-12-29 13:45:53 +08:00 |
|
rtz19970824
|
01f39f40d3
|
debug for unit test
|
2017-12-28 19:38:25 +08:00 |
|
Wenbo Hu
|
50e8ea36e8
|
merge
|
2017-12-29 03:31:57 +08:00 |
|
Wenbo Hu
|
63a0d32b34
|
use hash table for check_global_isomorphous
|
2017-12-29 03:30:09 +08:00 |
|
Wenbo Hu
|
da156ed88e
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2017-12-29 03:19:46 +08:00 |
|