Commit Graph

  • 610390c132 add docs of collector and trainer (#20) Trinkle23897 2020-04-05 18:34:45 +08:00
  • 4d4d0daf9e
    Performance improve (#18) Oblivion 2020-04-05 09:10:21 +08:00
  • b6c9db6b0b docs for env Trinkle23897 2020-04-04 21:02:06 +08:00
  • 9380368ca3
    add an example of bullet env (experiment from jiqizhixin) (#15) Oblivion 2020-04-04 11:46:18 +08:00
  • 974ade8019 add some docs Trinkle23897 2020-04-03 21:28:12 +08:00
  • 6cfa876591 hot fix Trinkle23897 2020-04-03 15:17:58 +08:00
  • 7cb5146611 add docs of trick Trinkle23897 2020-04-02 21:57:26 +08:00
  • 0e86d44860 finish concepts Trinkle23897 2020-04-02 12:31:22 +08:00
  • 0acd0d164c test api doc Trinkle23897 2020-04-02 09:07:04 +08:00
  • 0b08a41610
    move mujoco to examples (#12) Minghao Zhang 2020-04-02 08:49:19 +08:00
  • 4f843d3f51 update readme Trinkle23897 2020-04-01 10:21:58 +08:00
  • 4da857d86e
    Fix windows env setup bugs and other typo. (#11) ShenDezhou 2020-03-31 17:22:32 +08:00
  • 98feb79057
    fix bug in discrete_net.py (#10) Doxie 2020-03-31 16:13:53 +08:00
  • 04208e6cce update some tutorial Trinkle23897 2020-03-30 22:52:25 +08:00
  • 2169dd2201 update high-res logo Trinkle23897 2020-03-29 15:52:47 +08:00
  • 4e7df7616a update dqn tutorial Trinkle23897 2020-03-29 15:18:33 +08:00
  • d9e4b9d16f upd doc Trinkle23897 2020-03-29 10:22:03 +08:00
  • a326d30739 shorten quick start Trinkle23897 2020-03-28 22:40:47 +08:00
  • 57735ce1b5 add logo and sphinx setup Trinkle23897 2020-03-28 22:01:23 +08:00
  • f23b0dfac9 add ListReplayBuffer Trinkle23897 2020-03-28 15:14:41 +08:00
  • eb7fb37806
    fix PointMaze (#8) Minghao Zhang 2020-03-28 14:36:12 +08:00
  • f68f23292e update readme and force flake8 Trinkle23897 2020-03-28 13:27:01 +08:00
  • 068c4068ec
    fix atari/mujoco env (#7) Minghao Zhang 2020-03-28 12:03:49 +08:00
  • c42990c725 add rllib result and fix pep8 Trinkle23897 2020-03-28 09:43:35 +08:00
  • 77068af526
    add examples, fix some bugs (#5) Minghao Zhang 2020-03-28 07:27:18 +08:00
  • acb93502cf
    Update README.md sproblvem 2020-03-27 16:52:07 +08:00
  • 044aae4355 add baseline and rlpyt result Trinkle23897 2020-03-27 16:24:07 +08:00
  • 44f911bc31 add pytorch drl result Trinkle23897 2020-03-27 09:04:29 +08:00
  • 519f9f20d0 update readme Trinkle23897 2020-03-26 17:32:51 +08:00
  • c505cd8205 update readme Trinkle23897 2020-03-26 11:42:34 +08:00
  • 3c0a09fefd
    minor reformat (#2) Minghao Zhang 2020-03-26 09:01:20 +08:00
  • fdc969b830 fix collector Trinkle23897 2020-03-25 14:08:28 +08:00
  • e95218e295 sac Trinkle23897 2020-03-23 17:17:41 +08:00
  • 30a0fc079c td3 Trinkle23897 2020-03-23 11:34:52 +08:00
  • a87563b8e6 add demo of ppo continuous action task Trinkle23897 2020-03-21 17:04:42 +08:00
  • c173f7bfbc fix ddpg Trinkle23897 2020-03-21 15:31:31 +08:00
  • 8bd8246b16 refract test code Trinkle23897 2020-03-21 10:58:01 +08:00
  • d64d78d769 seed??? Trinkle23897 2020-03-20 21:51:09 +08:00
  • 75364cd986 ppo and early stop Trinkle23897 2020-03-20 19:52:29 +08:00
  • c87fe3c18c add trainer Trinkle23897 2020-03-19 17:23:46 +08:00
  • 9c5417dd51 change env to vecenv for higher code coverage rate Trinkle23897 2020-03-18 21:56:03 +08:00
  • 64bab0b6a0 ddpg Trinkle23897 2020-03-18 21:45:41 +08:00
  • 6e563fe61a a2c Trinkle23897 2020-03-17 20:22:37 +08:00
  • fd621971e5 fix bug in test Trinkle23897 2020-03-17 15:16:30 +08:00
  • 39de63592f finish pg Trinkle23897 2020-03-17 11:37:31 +08:00
  • 8b0b970c9b add speed stat Trinkle23897 2020-03-16 15:04:58 +08:00
  • cef5de8b83 fix some bugs Trinkle23897 2020-03-16 11:11:29 +08:00
  • 5983c6b33d finish dqn Trinkle23897 2020-03-15 17:41:00 +08:00
  • c804662457 add cache buf in collector Trinkle23897 2020-03-14 21:48:31 +08:00
  • 543e57cdbd clear Trinkle23897 2020-03-13 21:47:17 +08:00
  • f16e05c0e7 maybe finished collector? Trinkle23897 2020-03-13 17:49:22 +08:00
  • f58c1397c6 half of collector Trinkle23897 2020-03-12 22:20:33 +08:00
  • 4a1a7dd670 fix a bug Trinkle23897 2020-03-11 18:02:19 +08:00
  • 6632e47b9d add test_buffer Trinkle23897 2020-03-11 17:28:51 +08:00
  • 04557fdb82 env test \ ray Trinkle23897 2020-03-11 16:14:53 +08:00
  • 7533e5b0ac add first test Trinkle23897 2020-03-11 10:56:38 +08:00
  • 5550aed0a1 flake8 fix Trinkle23897 2020-03-11 09:38:14 +08:00
  • 776acd9f13 github ci Trinkle23897 2020-03-11 09:18:28 +08:00
  • 0dfb900e29 env and data Trinkle23897 2020-03-11 09:09:56 +08:00
  • 0c944eab68 init Trinkle23897 2020-03-09 11:38:04 +08:00
  • bdd85f8a27 stop gradient in policy/distributional priv haoshengzou 2018-12-24 09:06:59 +08:00
  • 909dc786d1 advantage estimation function all take my_feed_dict (all examples runnable); such requirement should be made a signature haoshengzou 2018-11-22 08:03:03 +08:00
  • c937630bd3 add some my_feed_dict in advantage_estimation and data_collector haoshengzou 2018-08-16 16:20:14 +08:00
  • a791916fc4 add clear() for replay_buffer haoshengzou 2018-08-15 09:53:46 +08:00
  • 00d4cb0fca merging haoshengzou 2018-06-15 18:46:45 +08:00
  • f8c359b094 add dqn and ppo examples, bit clean-up haoshengzou 2018-06-14 11:18:39 +08:00
  • 99da5619e5 fix code example in tutorial. leave render to be future work haoshengzou 2018-05-29 11:04:32 +08:00
  • 6f206759ab add __all__ haoshengzou 2018-05-20 22:36:04 +08:00
  • eb8c82636e setup.py, now "pip install"-able haoshengzou 2018-04-17 06:34:38 +08:00
  • 2527030838 fix the bug of unnamed_dict.update(). import cleaning in examples/*.py haoshengzou 2018-04-16 20:17:41 +08:00
  • d84c9d121c first master version haoshengzou 2018-04-16 18:02:00 +08:00
  • 5f979caf58 finish all API docs, first version. haoshengzou 2018-04-15 17:41:43 +08:00
  • 8c108174b6 some more API docs haoshengzou 2018-04-15 11:46:46 +08:00
  • 9186dae6a3 more API docs haoshengzou 2018-04-15 09:35:31 +08:00
  • 2a3bc3ef35 part of API doc haoshengzou 2018-04-12 21:10:50 +08:00
  • 03246f7ded functional code freeze. all examples working. prepare to release. haoshengzou 2018-04-11 14:23:40 +08:00
  • 739d360d9d fix episode_cutoff haoshengzou 2018-03-31 19:26:48 +08:00
  • ace59787ed Merge remote-tracking branch 'origin/master' haoshengzou 2018-03-28 18:47:54 +08:00
  • 75e7f14051 towards ddpg haoshengzou 2018-03-28 18:47:41 +08:00
  • 07099654bd a bash file for training rtz19970824 2018-03-21 16:11:17 +08:00
  • f70dfb0559 clean code rtz19970824 2018-03-14 19:17:28 +08:00
  • 52e6b09768 finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! haoshengzou 2018-03-11 17:47:42 +08:00
  • a86354834c actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow. haoshengzou 2018-03-11 15:07:41 +08:00
  • 498b55c051 ppo with batch also works! now ppo improves steadily, dqn not so stable. haoshengzou 2018-03-10 17:30:11 +08:00
  • 6eb69c7867 Merge remote-tracking branch 'origin/master' haoshengzou 2018-03-09 15:10:10 +08:00
  • 33094eab1d delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks. haoshengzou 2018-03-09 15:09:14 +08:00
  • 92894d3853 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. haoshengzou 2018-03-09 15:07:14 +08:00
  • 905d12bfa2 working on tester haoshengzou 2018-03-09 09:25:19 +08:00
  • e68dcd3c64 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. haoshengzou 2018-03-08 16:51:12 +08:00
  • 24d75fd1aa call nstep_q_return from dqn_replay.py, still need test Dong Yan 2018-03-06 20:48:07 +08:00
  • 2a2274aeea initial data_collector. working on examples/dqn_replay.py to run haoshengzou 2018-03-04 21:29:58 +08:00
  • 54a7b1343d design exploration and evaluators for off-policy algos haoshengzou 2018-03-04 13:53:29 +08:00
  • 2eb056a721 Merge branch 'master' of github.com:sproblvem/tianshou Dong Yan 2018-03-03 21:30:15 +08:00
  • 0cf2fd6c53 an initial version of untested replaymemory qreturn Dong Yan 2018-03-03 21:25:29 +08:00
  • e302fd87fb vanilla replay buffer finished and tested. working on data_collector. haoshengzou 2018-03-03 20:42:34 +08:00
  • 528c4be93c add render option for ddpg Dong Yan 2018-02-28 18:44:06 +08:00
  • 5ab2fa3b65 minor fixes haoshengzou 2018-02-27 14:46:02 +08:00
  • 675057c6b9 interfaces for advantage_estimation. full_return finished and tested. haoshengzou 2018-02-27 14:11:52 +08:00
  • 25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou songshshshsh 2018-02-27 13:15:36 +08:00
  • 67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing songshshshsh 2018-02-27 13:10:47 +08:00