Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
haoshengzou
|
b8568c6af4
|
added data/utils.py. was ignored by .gitignore before...
|
2018-01-25 10:15:38 +08:00 |
|
haoshengzou
|
5910e08672
|
data/utils.py added but not pushed...
|
2018-01-25 10:11:36 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
JialianLee
|
8b7b4b6c6b
|
Add dirichlet noise to root prior and add uniform noise to initial Q value
|
2018-01-05 17:02:19 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
4333ee5d39
|
ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
|
2018-01-02 19:40:37 +08:00 |
|
JialianLee
|
5849776c9a
|
Modification and doc for unit test
|
2017-12-29 13:45:53 +08:00 |
|
rtz19970824
|
01f39f40d3
|
debug for unit test
|
2017-12-28 19:38:25 +08:00 |
|
JialianLee
|
4140d8c9d2
|
Modification on unit test
|
2017-12-28 17:10:25 +08:00 |
|
JialianLee
|
0352866b1a
|
Modification for game engine
|
2017-12-28 16:27:28 +08:00 |
|
JialianLee
|
5457e5134e
|
add a unit test
|
2017-12-28 16:20:44 +08:00 |
|
Dong Yan
|
08b6649fea
|
test next_action.next_state in MCTS
|
2017-12-28 15:52:31 +08:00 |
|
Dong Yan
|
47676993fd
|
solve the performance bottleneck by only hashing the last board
|
2017-12-28 01:16:24 +08:00 |
|
Dong Yan
|
affd0319e2
|
rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action
|
2017-12-27 21:11:40 +08:00 |
|
Dong Yan
|
d48982d59e
|
move evaluator from action node to mcts
|
2017-12-27 20:49:54 +08:00 |
|
JialianLee
|
8d102d249f
|
Modification for backpropagation process
|
2017-12-27 18:55:00 +08:00 |
|
Dong Yan
|
9f60984973
|
remove type_conversion function
|
2017-12-27 14:08:34 +08:00 |
|
Dong Yan
|
a1f6044cba
|
rewrite selection function of ActionNode for clarity, add and delete some notes
|
2017-12-27 11:43:04 +08:00 |
|
Dong Yan
|
7f0565a5f6
|
variable rename and delete redundant code
|
2017-12-26 22:19:10 +08:00 |
|
sproblvem
|
2b24f0760e
|
Merge branch 'master' into mcts_virtual_loss
|
2017-12-24 21:27:54 +08:00 |
|
Dong Yan
|
89226b449a
|
replace try catch by isinstance collections.Hashable
|
2017-12-24 20:57:53 +08:00 |
|
Dong Yan
|
f0074aa7ca
|
fix bug of game config and add profing functions to mcts
|
2017-12-24 17:43:45 +08:00 |
|
mcgrady00h
|
5aa5dcd191
|
add comments for mcts with virtual loss
|
2017-12-24 16:47:43 +08:00 |
|
mcgrady00h
|
8c6f44a015
|
Merge remote-tracking branch 'origin' into mcts_virtual_loss
|
2017-12-24 15:49:45 +08:00 |
|
mcgrady00h
|
941284e7b1
|
Merge remote-tracking branch 'origin' into mcts_virtual_loss
|
2017-12-24 15:44:30 +08:00 |
|
rtz19970824
|
74504ceb1d
|
debug for go and reversi
|
2017-12-24 14:40:50 +08:00 |
|
Dong Yan
|
426251e158
|
add some code for debug and profiling
|
2017-12-24 01:07:46 +08:00 |
|
haoshengzou
|
b2b2d01d9c
|
Merge remote-tracking branch 'origin/master'
|
2017-12-23 17:25:37 +08:00 |
|
haoshengzou
|
b21a55dc88
|
towards policy/value refactor
|
2017-12-23 17:25:16 +08:00 |
|
rtz19970824
|
3f238864fb
|
minor fixed for mcts, check finish for go
|
2017-12-23 15:58:06 +08:00 |
|
haoshengzou
|
8c13d8ebe6
|
Merge remote-tracking branch 'origin/master'
|
2017-12-23 15:36:44 +08:00 |
|
haoshengzou
|
04048b7873
|
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.
|
2017-12-23 15:36:10 +08:00 |
|
Dong Yan
|
b2ef770415
|
connect reversi with game
|
2017-12-23 13:05:25 +08:00 |
|
mcgrady00h
|
3b534064bd
|
fix virtual loss bug
|
2017-12-23 02:48:53 +08:00 |
|
Haosheng Zou
|
8ba16a8808
|
Merge remote-tracking branch 'origin/master'
|
2017-12-22 00:24:06 +08:00 |
|
Haosheng Zou
|
1cc5063007
|
add value_function (critic). value_function and policy not finished yet.
|
2017-12-22 00:22:23 +08:00 |
|
Wenbo Hu
|
ced63af18f
|
fixing bug pass parameterg
|
2017-12-21 19:31:51 +08:00 |
|
Wenbo Hu
|
f0d59dab6c
|
forbid pass, if we have other choices
|
2017-12-20 22:10:47 +08:00 |
|
Wenbo Hu
|
e2c6b96e57
|
minor revision.
|
2017-12-20 21:52:30 +08:00 |
|
Wenbo Hu
|
48e95a21ea
|
simulator process a valid set, instead of a single action
|
2017-12-20 21:35:35 +08:00 |
|
rtz19970824
|
7fca90c61b
|
modify the mcts, refactor the network
|
2017-12-20 16:43:42 +08:00 |
|
Dong Yan
|
232204d797
|
fix the copy bug in check_global_isomorphous; refactor code to eliminate side effect
|
2017-12-19 22:57:38 +08:00 |
|
mcgrady00h
|
1f011a44ef
|
add mcts virtual loss version (may have bugs)
|
2017-12-19 17:04:55 +08:00 |
|
Dong Yan
|
fc8114fe35
|
merge flatten and deflatten, rename variable for clarity
|
2017-12-19 16:51:50 +08:00 |
|