Haosheng Zou
|
6611d948dd
|
add value_function (critic). value_function and policy not finished yet.
|
2017-12-22 00:22:23 +08:00 |
|
宋世虹
|
3624cc9036
|
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit
|
2017-12-17 12:52:00 +08:00 |
|
rtz19970824
|
e5bf7a9270
|
implement dqn loss and dpg loss, add TODO for separate actor and critic
|
2017-12-15 14:24:08 +08:00 |
|
Haosheng Zou
|
7ab211b63c
|
preliminary design of dqn_example, dqn interface. identify the assign of networks
|
2017-12-13 20:47:45 +08:00 |
|
rtz19970824
|
0c4a83f3eb
|
vanilla policy gradient
|
2017-12-11 13:37:27 +08:00 |
|
haosheng
|
a00b930c2c
|
fix naming and comments of coding style, delete .json
|
2017-12-10 17:23:13 +08:00 |
|
haosheng
|
ff4306ddb9
|
model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs
|
2017-12-08 21:09:23 +08:00 |
|