15 Commits

Author SHA1 Message Date
haoshengzou
c937630bd3 add some my_feed_dict in advantage_estimation and data_collector 2018-08-16 16:20:14 +08:00
haoshengzou
6f206759ab add __all__ 2018-05-20 22:36:04 +08:00
haoshengzou
5f979caf58 finish all API docs, first version. 2018-04-15 17:41:43 +08:00
haoshengzou
8c108174b6 some more API docs 2018-04-15 11:46:46 +08:00
haoshengzou
03246f7ded functional code freeze. all examples working. prepare to release. 2018-04-11 14:23:40 +08:00
haoshengzou
739d360d9d fix episode_cutoff 2018-03-31 19:26:48 +08:00
haoshengzou
75e7f14051 towards ddpg 2018-03-28 18:47:41 +08:00
haoshengzou
52e6b09768 finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00
haoshengzou
498b55c051 ppo with batch also works! now ppo improves steadily, dqn not so stable. 2018-03-10 17:30:11 +08:00
haoshengzou
92894d3853 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-09 15:07:14 +08:00
haoshengzou
e68dcd3c64 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-08 16:51:12 +08:00
Dong Yan
24d75fd1aa call nstep_q_return from dqn_replay.py, still need test 2018-03-06 20:48:07 +08:00
haoshengzou
2a2274aeea initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00
haoshengzou
54a7b1343d design exploration and evaluators for off-policy algos 2018-03-04 13:53:29 +08:00
haoshengzou
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. 2018-03-03 20:42:34 +08:00