117 Commits

Author SHA1 Message Date
haoshengzou
c937630bd3 add some my_feed_dict in advantage_estimation and data_collector 2018-08-16 16:20:14 +08:00
haoshengzou
a791916fc4 add clear() for replay_buffer 2018-08-15 09:53:46 +08:00
haoshengzou
6f206759ab add __all__ 2018-05-20 22:36:04 +08:00
haoshengzou
eb8c82636e setup.py, now "pip install"-able 2018-04-17 06:34:38 +08:00
haoshengzou
2527030838 fix the bug of unnamed_dict.update(). import cleaning in examples/*.py 2018-04-16 20:17:41 +08:00
haoshengzou
d84c9d121c first master version 2018-04-16 18:02:00 +08:00
haoshengzou
5f979caf58 finish all API docs, first version. 2018-04-15 17:41:43 +08:00
haoshengzou
8c108174b6 some more API docs 2018-04-15 11:46:46 +08:00
haoshengzou
9186dae6a3 more API docs 2018-04-15 09:35:31 +08:00
haoshengzou
2a3bc3ef35 part of API doc 2018-04-12 21:10:50 +08:00
haoshengzou
03246f7ded functional code freeze. all examples working. prepare to release. 2018-04-11 14:23:40 +08:00
haoshengzou
739d360d9d fix episode_cutoff 2018-03-31 19:26:48 +08:00
haoshengzou
75e7f14051 towards ddpg 2018-03-28 18:47:41 +08:00
haoshengzou
52e6b09768 finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00
haoshengzou
a86354834c actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow. 2018-03-11 15:07:41 +08:00
haoshengzou
498b55c051 ppo with batch also works! now ppo improves steadily, dqn not so stable. 2018-03-10 17:30:11 +08:00
haoshengzou
92894d3853 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-09 15:07:14 +08:00
haoshengzou
e68dcd3c64 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-08 16:51:12 +08:00
Dong Yan
24d75fd1aa call nstep_q_return from dqn_replay.py, still need test 2018-03-06 20:48:07 +08:00
haoshengzou
2a2274aeea initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00
haoshengzou
54a7b1343d design exploration and evaluators for off-policy algos 2018-03-04 13:53:29 +08:00
Dong Yan
2eb056a721 Merge branch 'master' of github.com:sproblvem/tianshou 2018-03-03 21:30:15 +08:00
Dong Yan
0cf2fd6c53 an initial version of untested replaymemory qreturn 2018-03-03 21:25:29 +08:00
haoshengzou
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. 2018-03-03 20:42:34 +08:00
haoshengzou
5ab2fa3b65 minor fixes 2018-02-27 14:46:02 +08:00
haoshengzou
675057c6b9 interfaces for advantage_estimation. full_return finished and tested. 2018-02-27 14:11:52 +08:00
songshshshsh
25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-02-27 13:15:36 +08:00
songshshshsh
67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing 2018-02-27 13:13:38 +08:00
haoshengzou
40190a282e Merge remote-tracking branch 'origin/master'
# Conflicts:
#	README.md
2018-02-26 11:48:46 +08:00
haoshengzou
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. 2018-02-26 11:47:02 +08:00
Dong Yan
0bc1b63e38 add epsilon-greedy for dqn 2018-02-25 16:31:35 +08:00
Dong Yan
f3aee448e0 add option to show the running result of cartpole 2018-02-24 10:53:39 +08:00
Dong Yan
2163d18728 fix the env -> self._env bug 2018-02-10 03:42:00 +08:00
haoshengzou
b8568c6af4 added data/utils.py. was ignored by .gitignore before... 2018-01-25 10:15:38 +08:00
haoshengzou
5910e08672 data/utils.py added but not pushed... 2018-01-25 10:11:36 +08:00
haoshengzou
f32e1d9c12 finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet. 2018-01-18 17:38:52 +08:00
haoshengzou
8fbde8283f finish dqn example. advantage estimation module is not complete yet. 2018-01-18 12:19:48 +08:00
haoshengzou
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-17 11:55:51 +08:00
haoshengzou
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-15 16:32:30 +08:00
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
JialianLee
8b7b4b6c6b Add dirichlet noise to root prior and add uniform noise to initial Q value 2018-01-05 17:02:19 +08:00
haoshengzou
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research 2018-01-03 20:32:05 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
JialianLee
5849776c9a Modification and doc for unit test 2017-12-29 13:45:53 +08:00
rtz19970824
01f39f40d3 debug for unit test 2017-12-28 19:38:25 +08:00
JialianLee
4140d8c9d2 Modification on unit test 2017-12-28 17:10:25 +08:00
JialianLee
0352866b1a Modification for game engine 2017-12-28 16:27:28 +08:00
JialianLee
5457e5134e add a unit test 2017-12-28 16:20:44 +08:00
Dong Yan
08b6649fea test next_action.next_state in MCTS 2017-12-28 15:52:31 +08:00