haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
rtz19970824
|
3b222f5edb
|
add an args to intrigue training
|
2018-01-13 15:59:57 +08:00 |
|
rtz19970824
|
2e8662889f
|
add multi-thread for end-to-end training
|
2018-01-13 15:57:41 +08:00 |
|
rtz19970824
|
fcaa571b42
|
add the interface in engine.py
|
2018-01-12 21:48:01 +08:00 |
|
Dong Yan
|
68cc63144f
|
fix the hash conflict bug
|
2018-01-12 21:08:07 +08:00 |
|
rtz19970824
|
90ffdcbb1f
|
check the latest checkpoint while self play
|
2018-01-12 19:16:44 +08:00 |
|
rtz19970824
|
c217aa165d
|
add some error message for better debugging
|
2018-01-12 17:17:03 +08:00 |
|
Dong Yan
|
e58df65301
|
fix the async bug between think and do move checking, which introduced by bobo
|
2018-01-11 21:00:32 +08:00 |
|
Dong Yan
|
afc55ed9c2
|
refactor code to avoid memory leak
|
2018-01-11 17:02:36 +08:00 |
|
sproblvem
|
284cc64c18
|
Merge pull request #3 from sproblvem/double-network
Double network
|
2018-01-11 10:55:12 +08:00 |
|
Dong Yan
|
5482815de6
|
replace two isolated player process by two different set of variables in the tf graph
|
2018-01-10 23:27:17 +08:00 |
|
Dong Yan
|
f425085e0a
|
fix the tf assign error of copy the trained variable from black to white
|
2018-01-09 21:16:35 +08:00 |
|
rtz19970824
|
c2775df8e6
|
modify game.py for multi-player
|
2018-01-09 20:09:48 +08:00 |
|
rtz19970824
|
eb0ce95919
|
modify model.py for multi-player
|
2018-01-09 19:50:37 +08:00 |
|
Tongzheng Ren
|
891c5b1e47
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-01-08 21:21:08 +08:00 |
|
Tongzheng Ren
|
f2edc4896e
|
modify play.py for avoiding potential bug
|
2018-01-08 21:19:17 +08:00 |
|
rtz19970824
|
32b7b33ed5
|
debug: we should estimate our own win rate
|
2018-01-08 16:19:59 +08:00 |
|
JialianLee
|
8b7b4b6c6b
|
Add dirichlet noise to root prior and add uniform noise to initial Q value
|
2018-01-05 17:02:19 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
4333ee5d39
|
ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
|
2018-01-02 19:40:37 +08:00 |
|
haoshengzou
|
88648f0c4b
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2017-12-31 15:56:19 +08:00 |
|
JialianLee
|
5849776c9a
|
Modification and doc for unit test
|
2017-12-29 13:45:53 +08:00 |
|
rtz19970824
|
01f39f40d3
|
debug for unit test
|
2017-12-28 19:38:25 +08:00 |
|
Wenbo Hu
|
50e8ea36e8
|
merge
|
2017-12-29 03:31:57 +08:00 |
|
Wenbo Hu
|
63a0d32b34
|
use hash table for check_global_isomorphous
|
2017-12-29 03:30:09 +08:00 |
|
Wenbo Hu
|
da156ed88e
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2017-12-29 03:19:46 +08:00 |
|
Wenbo Hu
|
76ac579056
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2017-12-29 01:05:14 +08:00 |
|
rtz19970824
|
2dfab68efe
|
debug for unit test
|
2017-12-28 19:28:21 +08:00 |
|
JialianLee
|
4140d8c9d2
|
Modification on unit test
|
2017-12-28 17:10:25 +08:00 |
|
JialianLee
|
0352866b1a
|
Modification for game engine
|
2017-12-28 16:27:28 +08:00 |
|
JialianLee
|
5457e5134e
|
add a unit test
|
2017-12-28 16:20:44 +08:00 |
|
rtz19970824
|
b699258e76
|
debug for reversi
|
2017-12-28 15:55:07 +08:00 |
|
Dong Yan
|
08b6649fea
|
test next_action.next_state in MCTS
|
2017-12-28 15:52:31 +08:00 |
|
Dong Yan
|
47676993fd
|
solve the performance bottleneck by only hashing the last board
|
2017-12-28 01:16:24 +08:00 |
|
Dong Yan
|
affd0319e2
|
rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action
|
2017-12-27 21:11:40 +08:00 |
|
Dong Yan
|
d48982d59e
|
move evaluator from action node to mcts
|
2017-12-27 20:49:54 +08:00 |
|
rtz19970824
|
0a160065aa
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2017-12-27 19:54:52 +08:00 |
|
rtz19970824
|
f2291efc72
|
check exists when save data
|
2017-12-27 19:54:36 +08:00 |
|
JialianLee
|
8d102d249f
|
Modification for backpropagation process
|
2017-12-27 18:55:00 +08:00 |
|
Dong Yan
|
9f60984973
|
remove type_conversion function
|
2017-12-27 14:08:34 +08:00 |
|
Dong Yan
|
a1f6044cba
|
rewrite selection function of ActionNode for clarity, add and delete some notes
|
2017-12-27 11:43:04 +08:00 |
|
Dong Yan
|
c788b253fb
|
show the stdout of player.py for debugging
|
2017-12-27 01:04:09 +08:00 |
|
Dong Yan
|
7f0565a5f6
|
variable rename and delete redundant code
|
2017-12-26 22:19:10 +08:00 |
|
Dong Yan
|
0c3ff3bf37
|
delete unused code
|
2017-12-26 19:29:35 +08:00 |
|
Dong Yan
|
029ab199f4
|
add softmax for mcts root node
|
2017-12-26 16:47:24 +08:00 |
|
Dong Yan
|
8f508c790b
|
add role for mcts debug
|
2017-12-26 15:07:15 +08:00 |
|
Dong Yan
|
aa6b5434c6
|
add debuf info for mcts and add softmax for the prior
|
2017-12-26 14:46:14 +08:00 |
|