11 Commits

Author SHA1 Message Date
haoshengzou
fed3bf2a12 auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging. 2018-01-14 20:58:28 +08:00
haoshengzou
4333ee5d39 ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper 2018-01-02 19:40:37 +08:00
haoshengzou
b33a141373 towards policy/value refactor 2017-12-23 17:25:16 +08:00
haoshengzou
2addef41d2 fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development. 2017-12-23 15:36:10 +08:00
Haosheng Zou
6611d948dd add value_function (critic). value_function and policy not finished yet. 2017-12-22 00:22:23 +08:00
宋世虹
3624cc9036 finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit 2017-12-17 12:52:00 +08:00
rtz19970824
e5bf7a9270 implement dqn loss and dpg loss, add TODO for separate actor and critic 2017-12-15 14:24:08 +08:00
Haosheng Zou
7ab211b63c preliminary design of dqn_example, dqn interface. identify the assign of networks 2017-12-13 20:47:45 +08:00
rtz19970824
0c4a83f3eb vanilla policy gradient 2017-12-11 13:37:27 +08:00
haosheng
a00b930c2c fix naming and comments of coding style, delete .json 2017-12-10 17:23:13 +08:00
haosheng
ff4306ddb9 model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs 2017-12-08 21:09:23 +08:00