haoshengzou
|
92894d3853
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-09 15:07:14 +08:00 |
|
haoshengzou
|
e68dcd3c64
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-08 16:51:12 +08:00 |
|
Dong Yan
|
24d75fd1aa
|
call nstep_q_return from dqn_replay.py, still need test
|
2018-03-06 20:48:07 +08:00 |
|
haoshengzou
|
2a2274aeea
|
initial data_collector. working on examples/dqn_replay.py to run
|
2018-03-04 21:29:58 +08:00 |
|
haoshengzou
|
54a7b1343d
|
design exploration and evaluators for off-policy algos
|
2018-03-04 13:53:29 +08:00 |
|
Dong Yan
|
2eb056a721
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2018-03-03 21:30:15 +08:00 |
|
Dong Yan
|
0cf2fd6c53
|
an initial version of untested replaymemory qreturn
|
2018-03-03 21:25:29 +08:00 |
|
haoshengzou
|
e302fd87fb
|
vanilla replay buffer finished and tested. working on data_collector.
|
2018-03-03 20:42:34 +08:00 |
|
haoshengzou
|
5ab2fa3b65
|
minor fixes
|
2018-02-27 14:46:02 +08:00 |
|
haoshengzou
|
675057c6b9
|
interfaces for advantage_estimation. full_return finished and tested.
|
2018-02-27 14:11:52 +08:00 |
|
songshshshsh
|
25b25ce7d8
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-02-27 13:15:36 +08:00 |
|
songshshshsh
|
67d0e78ab9
|
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
|
2018-02-27 13:13:38 +08:00 |
|
haoshengzou
|
40190a282e
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# README.md
|
2018-02-26 11:48:46 +08:00 |
|
haoshengzou
|
87889d766c
|
minor fixes. proceed to refactor replay to use lists as in batch.
|
2018-02-26 11:47:02 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
Dong Yan
|
f3aee448e0
|
add option to show the running result of cartpole
|
2018-02-24 10:53:39 +08:00 |
|
Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
haoshengzou
|
b8568c6af4
|
added data/utils.py. was ignored by .gitignore before...
|
2018-01-25 10:15:38 +08:00 |
|
haoshengzou
|
5910e08672
|
data/utils.py added but not pushed...
|
2018-01-25 10:11:36 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
JialianLee
|
8b7b4b6c6b
|
Add dirichlet noise to root prior and add uniform noise to initial Q value
|
2018-01-05 17:02:19 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
4333ee5d39
|
ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
|
2018-01-02 19:40:37 +08:00 |
|
JialianLee
|
5849776c9a
|
Modification and doc for unit test
|
2017-12-29 13:45:53 +08:00 |
|
rtz19970824
|
01f39f40d3
|
debug for unit test
|
2017-12-28 19:38:25 +08:00 |
|
JialianLee
|
4140d8c9d2
|
Modification on unit test
|
2017-12-28 17:10:25 +08:00 |
|
JialianLee
|
0352866b1a
|
Modification for game engine
|
2017-12-28 16:27:28 +08:00 |
|
JialianLee
|
5457e5134e
|
add a unit test
|
2017-12-28 16:20:44 +08:00 |
|
Dong Yan
|
08b6649fea
|
test next_action.next_state in MCTS
|
2017-12-28 15:52:31 +08:00 |
|
Dong Yan
|
47676993fd
|
solve the performance bottleneck by only hashing the last board
|
2017-12-28 01:16:24 +08:00 |
|
Dong Yan
|
affd0319e2
|
rewrite the selection fuction of UCTNode to return the action node instead of return the state node and next action
|
2017-12-27 21:11:40 +08:00 |
|
Dong Yan
|
d48982d59e
|
move evaluator from action node to mcts
|
2017-12-27 20:49:54 +08:00 |
|
JialianLee
|
8d102d249f
|
Modification for backpropagation process
|
2017-12-27 18:55:00 +08:00 |
|
Dong Yan
|
9f60984973
|
remove type_conversion function
|
2017-12-27 14:08:34 +08:00 |
|
Dong Yan
|
a1f6044cba
|
rewrite selection function of ActionNode for clarity, add and delete some notes
|
2017-12-27 11:43:04 +08:00 |
|
Dong Yan
|
7f0565a5f6
|
variable rename and delete redundant code
|
2017-12-26 22:19:10 +08:00 |
|
sproblvem
|
2b24f0760e
|
Merge branch 'master' into mcts_virtual_loss
|
2017-12-24 21:27:54 +08:00 |
|
Dong Yan
|
89226b449a
|
replace try catch by isinstance collections.Hashable
|
2017-12-24 20:57:53 +08:00 |
|
Dong Yan
|
f0074aa7ca
|
fix bug of game config and add profing functions to mcts
|
2017-12-24 17:43:45 +08:00 |
|
mcgrady00h
|
5aa5dcd191
|
add comments for mcts with virtual loss
|
2017-12-24 16:47:43 +08:00 |
|
mcgrady00h
|
8c6f44a015
|
Merge remote-tracking branch 'origin' into mcts_virtual_loss
|
2017-12-24 15:49:45 +08:00 |
|
mcgrady00h
|
941284e7b1
|
Merge remote-tracking branch 'origin' into mcts_virtual_loss
|
2017-12-24 15:44:30 +08:00 |
|
rtz19970824
|
74504ceb1d
|
debug for go and reversi
|
2017-12-24 14:40:50 +08:00 |
|
Dong Yan
|
426251e158
|
add some code for debug and profiling
|
2017-12-24 01:07:46 +08:00 |
|
haoshengzou
|
b2b2d01d9c
|
Merge remote-tracking branch 'origin/master'
|
2017-12-23 17:25:37 +08:00 |
|