haoshengzou
|
54a7b1343d
|
design exploration and evaluators for off-policy algos
|
2018-03-04 13:53:29 +08:00 |
|
haoshengzou
|
e302fd87fb
|
vanilla replay buffer finished and tested. working on data_collector.
|
2018-03-03 20:42:34 +08:00 |
|
Dong Yan
|
528c4be93c
|
add render option for ddpg
|
2018-02-28 18:44:06 +08:00 |
|
haoshengzou
|
675057c6b9
|
interfaces for advantage_estimation. full_return finished and tested.
|
2018-02-27 14:11:52 +08:00 |
|
songshshshsh
|
25b25ce7d8
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-02-27 13:15:36 +08:00 |
|
songshshshsh
|
67d0e78ab9
|
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
|
2018-02-27 13:13:38 +08:00 |
|
haoshengzou
|
40190a282e
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# README.md
|
2018-02-26 11:48:46 +08:00 |
|
haoshengzou
|
87889d766c
|
minor fixes. proceed to refactor replay to use lists as in batch.
|
2018-02-26 11:47:02 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
Dong Yan
|
f3aee448e0
|
add option to show the running result of cartpole
|
2018-02-24 10:53:39 +08:00 |
|
Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
haoshengzou
|
9f96cc2461
|
finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.
|
2018-01-17 14:21:50 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
4333ee5d39
|
ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper
|
2018-01-02 19:40:37 +08:00 |
|
haoshengzou
|
b33a141373
|
towards policy/value refactor
|
2017-12-23 17:25:16 +08:00 |
|
haoshengzou
|
2addef41d2
|
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.
|
2017-12-23 15:36:10 +08:00 |
|
宋世虹
|
d220f7f2a8
|
add comments and todos
|
2017-12-17 13:28:21 +08:00 |
|
宋世虹
|
3624cc9036
|
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit
|
2017-12-17 12:52:00 +08:00 |
|
Haosheng Zou
|
92deae9f8d
|
minor fix
|
2017-12-14 19:46:38 +08:00 |
|
Haosheng Zou
|
7ab211b63c
|
preliminary design of dqn_example, dqn interface. identify the assign of networks
|
2017-12-13 20:47:45 +08:00 |
|
haosheng
|
972044c39d
|
minor fix
|
2017-12-10 17:33:10 +08:00 |
|
haosheng
|
a00b930c2c
|
fix naming and comments of coding style, delete .json
|
2017-12-10 17:23:13 +08:00 |
|
haosheng
|
ff4306ddb9
|
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs
|
2017-12-08 21:09:23 +08:00 |
|
Tongzheng Ren
|
6d9c369a65
|
architecture design patch two
|
2017-11-06 15:24:34 +08:00 |
|