haoshengzou
|
e68dcd3c64
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-08 16:51:12 +08:00 |
|
haoshengzou
|
2a2274aeea
|
initial data_collector. working on examples/dqn_replay.py to run
|
2018-03-04 21:29:58 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
haoshengzou
|
dfcea74fcf
|
fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research
|
2018-01-03 20:32:05 +08:00 |
|
haoshengzou
|
b33a141373
|
towards policy/value refactor
|
2017-12-23 17:25:16 +08:00 |
|
haoshengzou
|
2addef41d2
|
fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.
|
2017-12-23 15:36:10 +08:00 |
|
Haosheng Zou
|
6611d948dd
|
add value_function (critic). value_function and policy not finished yet.
|
2017-12-22 00:22:23 +08:00 |
|
宋世虹
|
d220f7f2a8
|
add comments and todos
|
2017-12-17 13:28:21 +08:00 |
|
宋世虹
|
3624cc9036
|
finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit
|
2017-12-17 12:52:00 +08:00 |
|
Haosheng Zou
|
039c8140e2
|
add dqn.py to write
|
2017-12-13 22:43:45 +08:00 |
|