haoshengzou
|
03246f7ded
|
functional code freeze. all examples working. prepare to release.
|
2018-04-11 14:23:40 +08:00 |
|
haoshengzou
|
e68dcd3c64
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-08 16:51:12 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|
haoshengzou
|
b33a141373
|
towards policy/value refactor
|
2017-12-23 17:25:16 +08:00 |
|
宋世虹
|
3624cc9036
|
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit
|
2017-12-17 12:52:00 +08:00 |
|
rtz19970824
|
e5bf7a9270
|
implement dqn loss and dpg loss, add TODO for separate actor and critic
|
2017-12-15 14:24:08 +08:00 |
|
rtz19970824
|
0c4a83f3eb
|
vanilla policy gradient
|
2017-12-11 13:37:27 +08:00 |
|
haosheng
|
a00b930c2c
|
fix naming and comments of coding style, delete .json
|
2017-12-10 17:23:13 +08:00 |
|
rtz19970824
|
a8a12f1083
|
coding style
|
2017-12-10 14:23:40 +08:00 |
|
haosheng
|
ff4306ddb9
|
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs
|
2017-12-08 21:09:23 +08:00 |
|