29 Commits

Author SHA1 Message Date
haoshengzou
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. 2018-03-03 20:42:34 +08:00
Dong Yan
528c4be93c add render option for ddpg 2018-02-28 18:44:06 +08:00
haoshengzou
675057c6b9 interfaces for advantage_estimation. full_return finished and tested. 2018-02-27 14:11:52 +08:00
songshshshsh
25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-02-27 13:15:36 +08:00
songshshshsh
67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing 2018-02-27 13:13:38 +08:00
haoshengzou
40190a282e Merge remote-tracking branch 'origin/master'
# Conflicts:
#	README.md
2018-02-26 11:48:46 +08:00
haoshengzou
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. 2018-02-26 11:47:02 +08:00
Dong Yan
0bc1b63e38 add epsilon-greedy for dqn 2018-02-25 16:31:35 +08:00
Dong Yan
f3aee448e0 add option to show the running result of cartpole 2018-02-24 10:53:39 +08:00
Dong Yan
2163d18728 fix the env -> self._env bug 2018-02-10 03:42:00 +08:00
haoshengzou
f32e1d9c12 finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet. 2018-01-18 17:38:52 +08:00
haoshengzou
8fbde8283f finish dqn example. advantage estimation module is not complete yet. 2018-01-18 12:19:48 +08:00
haoshengzou
9f96cc2461 finish design and running of ppo and actor-critic. advantage estimation module is not complete yet. 2018-01-17 14:21:50 +08:00
haoshengzou
ed25bf7586 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-17 11:55:51 +08:00
haoshengzou
d599506dc9 fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring. 2018-01-15 16:32:30 +08:00
haoshengzou
983cd36074 finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic. 2018-01-15 00:03:06 +08:00
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
haoshengzou
dfcea74fcf fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research 2018-01-03 20:32:05 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
haoshengzou
b33a141373 towards policy/value refactor 2017-12-23 17:25:16 +08:00
haoshengzou
2addef41d2 fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00
宋世虹
d220f7f2a8 add comments and todos 2017-12-17 13:28:21 +08:00
宋世虹
3624cc9036 finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit 2017-12-17 12:52:00 +08:00
Haosheng Zou
92deae9f8d minor fix 2017-12-14 19:46:38 +08:00
Haosheng Zou
7ab211b63c preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00
haosheng
972044c39d minor fix 2017-12-10 17:33:10 +08:00
haosheng
a00b930c2c fix naming and comments of coding style, delete .json 2017-12-10 17:23:13 +08:00
haosheng
ff4306ddb9 model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00
Tongzheng Ren
6d9c369a65 architecture design patch two 2017-11-06 15:24:34 +08:00