Tianshou

hongshaorou/Tianshou

Fork 0

Commit Graph

Select branches

Hide Pull Requests

feature/algo-eval

master

priv

v0.2.1

v0.2.2

v0.2.3

v0.2.4

v0.2.4.post1

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.0.post1

v0.3.0rc0

v0.3.1

v0.3.2

v0.4.0

v0.4.1

v0.4.10

v0.4.11

v0.4.2

v0.4.3

v0.4.4

v0.4.5

v0.4.6

v0.4.6.post1

v0.4.7

v0.4.8

v0.4.9

v0.5.0

v1.0.0

610390c132 add docs of collector and trainer (#20) Trinkle23897 2020-04-05 18:34:45 +08:00
4d4d0daf9e

Performance improve (#18) Oblivion 2020-04-05 09:10:21 +08:00
b6c9db6b0b docs for env Trinkle23897 2020-04-04 21:02:06 +08:00
9380368ca3

add an example of bullet env (experiment from jiqizhixin) (#15) Oblivion 2020-04-04 11:46:18 +08:00
974ade8019 add some docs Trinkle23897 2020-04-03 21:28:12 +08:00
6cfa876591 hot fix Trinkle23897 2020-04-03 15:17:58 +08:00
7cb5146611 add docs of trick Trinkle23897 2020-04-02 21:57:26 +08:00
0e86d44860 finish concepts Trinkle23897 2020-04-02 12:31:22 +08:00
0acd0d164c test api doc Trinkle23897 2020-04-02 09:07:04 +08:00
0b08a41610

move mujoco to examples (#12) Minghao Zhang 2020-04-02 08:49:19 +08:00
4f843d3f51 update readme Trinkle23897 2020-04-01 10:21:58 +08:00
4da857d86e

Fix windows env setup bugs and other typo. (#11) ShenDezhou 2020-03-31 17:22:32 +08:00
98feb79057

fix bug in discrete_net.py (#10) Doxie 2020-03-31 16:13:53 +08:00
04208e6cce update some tutorial Trinkle23897 2020-03-30 22:52:25 +08:00
2169dd2201 update high-res logo Trinkle23897 2020-03-29 15:52:47 +08:00
4e7df7616a update dqn tutorial Trinkle23897 2020-03-29 15:18:33 +08:00
d9e4b9d16f upd doc Trinkle23897 2020-03-29 10:22:03 +08:00
a326d30739 shorten quick start Trinkle23897 2020-03-28 22:40:47 +08:00
57735ce1b5 add logo and sphinx setup Trinkle23897 2020-03-28 22:01:23 +08:00
f23b0dfac9 add ListReplayBuffer Trinkle23897 2020-03-28 15:14:41 +08:00
eb7fb37806

fix PointMaze (#8) Minghao Zhang 2020-03-28 14:36:12 +08:00
f68f23292e update readme and force flake8 Trinkle23897 2020-03-28 13:27:01 +08:00
068c4068ec

fix atari/mujoco env (#7) Minghao Zhang 2020-03-28 12:03:49 +08:00
c42990c725 add rllib result and fix pep8 Trinkle23897 2020-03-28 09:43:35 +08:00
77068af526

add examples, fix some bugs (#5) Minghao Zhang 2020-03-28 07:27:18 +08:00
acb93502cf

Update README.md sproblvem 2020-03-27 16:52:07 +08:00
044aae4355 add baseline and rlpyt result Trinkle23897 2020-03-27 16:24:07 +08:00
44f911bc31 add pytorch drl result Trinkle23897 2020-03-27 09:04:29 +08:00
519f9f20d0 update readme Trinkle23897 2020-03-26 17:32:51 +08:00
c505cd8205 update readme Trinkle23897 2020-03-26 11:42:34 +08:00
3c0a09fefd

minor reformat (#2) Minghao Zhang 2020-03-26 09:01:20 +08:00
fdc969b830 fix collector Trinkle23897 2020-03-25 14:08:28 +08:00
e95218e295 sac Trinkle23897 2020-03-23 17:17:41 +08:00
30a0fc079c td3 Trinkle23897 2020-03-23 11:34:52 +08:00
a87563b8e6 add demo of ppo continuous action task Trinkle23897 2020-03-21 17:04:42 +08:00
c173f7bfbc fix ddpg Trinkle23897 2020-03-21 15:31:31 +08:00
8bd8246b16 refract test code Trinkle23897 2020-03-21 10:58:01 +08:00
d64d78d769 seed??? Trinkle23897 2020-03-20 21:51:09 +08:00
75364cd986 ppo and early stop Trinkle23897 2020-03-20 19:52:29 +08:00
c87fe3c18c add trainer Trinkle23897 2020-03-19 17:23:46 +08:00
9c5417dd51 change env to vecenv for higher code coverage rate Trinkle23897 2020-03-18 21:56:03 +08:00
64bab0b6a0 ddpg Trinkle23897 2020-03-18 21:45:41 +08:00
6e563fe61a a2c Trinkle23897 2020-03-17 20:22:37 +08:00
fd621971e5 fix bug in test Trinkle23897 2020-03-17 15:16:30 +08:00
39de63592f finish pg Trinkle23897 2020-03-17 11:37:31 +08:00
8b0b970c9b add speed stat Trinkle23897 2020-03-16 15:04:58 +08:00
cef5de8b83 fix some bugs Trinkle23897 2020-03-16 11:11:29 +08:00
5983c6b33d finish dqn Trinkle23897 2020-03-15 17:41:00 +08:00
c804662457 add cache buf in collector Trinkle23897 2020-03-14 21:48:31 +08:00
543e57cdbd clear Trinkle23897 2020-03-13 21:47:17 +08:00
f16e05c0e7 maybe finished collector? Trinkle23897 2020-03-13 17:49:22 +08:00
f58c1397c6 half of collector Trinkle23897 2020-03-12 22:20:33 +08:00
4a1a7dd670 fix a bug Trinkle23897 2020-03-11 18:02:19 +08:00
6632e47b9d add test_buffer Trinkle23897 2020-03-11 17:28:51 +08:00
04557fdb82 env test \ ray Trinkle23897 2020-03-11 16:14:53 +08:00
7533e5b0ac add first test Trinkle23897 2020-03-11 10:56:38 +08:00
5550aed0a1 flake8 fix Trinkle23897 2020-03-11 09:38:14 +08:00
776acd9f13 github ci Trinkle23897 2020-03-11 09:18:28 +08:00
0dfb900e29 env and data Trinkle23897 2020-03-11 09:09:56 +08:00
0c944eab68 init Trinkle23897 2020-03-09 11:38:04 +08:00
bdd85f8a27 stop gradient in policy/distributional priv haoshengzou 2018-12-24 09:06:59 +08:00
909dc786d1 advantage estimation function all take my_feed_dict (all examples runnable); such requirement should be made a signature haoshengzou 2018-11-22 08:03:03 +08:00
c937630bd3 add some my_feed_dict in advantage_estimation and data_collector haoshengzou 2018-08-16 16:20:14 +08:00
a791916fc4 add clear() for replay_buffer haoshengzou 2018-08-15 09:53:46 +08:00
00d4cb0fca merging haoshengzou 2018-06-15 18:46:45 +08:00
f8c359b094 add dqn and ppo examples, bit clean-up haoshengzou 2018-06-14 11:18:39 +08:00
99da5619e5 fix code example in tutorial. leave render to be future work haoshengzou 2018-05-29 11:04:32 +08:00
6f206759ab add __all__ haoshengzou 2018-05-20 22:36:04 +08:00
eb8c82636e setup.py, now "pip install"-able haoshengzou 2018-04-17 06:34:38 +08:00
2527030838 fix the bug of unnamed_dict.update(). import cleaning in examples/*.py haoshengzou 2018-04-16 20:17:41 +08:00
d84c9d121c first master version haoshengzou 2018-04-16 18:02:00 +08:00
5f979caf58 finish all API docs, first version. haoshengzou 2018-04-15 17:41:43 +08:00
8c108174b6 some more API docs haoshengzou 2018-04-15 11:46:46 +08:00
9186dae6a3 more API docs haoshengzou 2018-04-15 09:35:31 +08:00
2a3bc3ef35 part of API doc haoshengzou 2018-04-12 21:10:50 +08:00
03246f7ded functional code freeze. all examples working. prepare to release. haoshengzou 2018-04-11 14:23:40 +08:00
739d360d9d fix episode_cutoff haoshengzou 2018-03-31 19:26:48 +08:00
ace59787ed Merge remote-tracking branch 'origin/master' haoshengzou 2018-03-28 18:47:54 +08:00
75e7f14051 towards ddpg haoshengzou 2018-03-28 18:47:41 +08:00
07099654bd a bash file for training rtz19970824 2018-03-21 16:11:17 +08:00
f70dfb0559 clean code rtz19970824 2018-03-14 19:17:28 +08:00
52e6b09768 finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! haoshengzou 2018-03-11 17:47:42 +08:00
a86354834c actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow. haoshengzou 2018-03-11 15:07:41 +08:00
498b55c051 ppo with batch also works! now ppo improves steadily, dqn not so stable. haoshengzou 2018-03-10 17:30:11 +08:00
6eb69c7867 Merge remote-tracking branch 'origin/master' haoshengzou 2018-03-09 15:10:10 +08:00
33094eab1d delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks. haoshengzou 2018-03-09 15:09:14 +08:00
92894d3853 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. haoshengzou 2018-03-09 15:07:14 +08:00
905d12bfa2 working on tester haoshengzou 2018-03-09 09:25:19 +08:00
e68dcd3c64 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. haoshengzou 2018-03-08 16:51:12 +08:00
24d75fd1aa call nstep_q_return from dqn_replay.py, still need test Dong Yan 2018-03-06 20:48:07 +08:00
2a2274aeea initial data_collector. working on examples/dqn_replay.py to run haoshengzou 2018-03-04 21:29:58 +08:00
54a7b1343d design exploration and evaluators for off-policy algos haoshengzou 2018-03-04 13:53:29 +08:00
2eb056a721 Merge branch 'master' of github.com:sproblvem/tianshou Dong Yan 2018-03-03 21:30:15 +08:00
0cf2fd6c53 an initial version of untested replaymemory qreturn Dong Yan 2018-03-03 21:25:29 +08:00
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. haoshengzou 2018-03-03 20:42:34 +08:00
528c4be93c add render option for ddpg Dong Yan 2018-02-28 18:44:06 +08:00
5ab2fa3b65 minor fixes haoshengzou 2018-02-27 14:46:02 +08:00
675057c6b9 interfaces for advantage_estimation. full_return finished and tested. haoshengzou 2018-02-27 14:11:52 +08:00
25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou songshshshsh 2018-02-27 13:15:36 +08:00
67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing songshshshsh 2018-02-27 13:10:47 +08:00