This website requires JavaScript.
610390c132
add docs of collector and trainer (#20 )
Trinkle23897
2020-04-05 18:34:45 +08:00
4d4d0daf9e
Performance improve (#18 )
Oblivion
2020-04-05 09:10:21 +08:00
b6c9db6b0b
docs for env
Trinkle23897
2020-04-04 21:02:06 +08:00
9380368ca3
add an example of bullet env (experiment from jiqizhixin) (#15 )
Oblivion
2020-04-04 11:46:18 +08:00
974ade8019
add some docs
Trinkle23897
2020-04-03 21:28:12 +08:00
6cfa876591
hot fix
Trinkle23897
2020-04-03 15:17:58 +08:00
7cb5146611
add docs of trick
Trinkle23897
2020-04-02 21:57:26 +08:00
0e86d44860
finish concepts
Trinkle23897
2020-04-02 12:31:22 +08:00
0acd0d164c
test api doc
Trinkle23897
2020-04-02 09:07:04 +08:00
0b08a41610
move mujoco to examples (#12 )
Minghao Zhang
2020-04-02 08:49:19 +08:00
4f843d3f51
update readme
Trinkle23897
2020-04-01 10:21:58 +08:00
4da857d86e
Fix windows env setup bugs and other typo. (#11 )
ShenDezhou
2020-03-31 17:22:32 +08:00
98feb79057
fix bug in discrete_net.py (#10 )
Doxie
2020-03-31 16:13:53 +08:00
04208e6cce
update some tutorial
Trinkle23897
2020-03-30 22:52:25 +08:00
2169dd2201
update high-res logo
Trinkle23897
2020-03-29 15:52:47 +08:00
4e7df7616a
update dqn tutorial
Trinkle23897
2020-03-29 15:18:33 +08:00
d9e4b9d16f
upd doc
Trinkle23897
2020-03-29 10:22:03 +08:00
a326d30739
shorten quick start
Trinkle23897
2020-03-28 22:40:47 +08:00
57735ce1b5
add logo and sphinx setup
Trinkle23897
2020-03-28 22:01:23 +08:00
f23b0dfac9
add ListReplayBuffer
Trinkle23897
2020-03-28 15:14:41 +08:00
eb7fb37806
fix PointMaze (#8 )
Minghao Zhang
2020-03-28 14:36:12 +08:00
f68f23292e
update readme and force flake8
Trinkle23897
2020-03-28 13:27:01 +08:00
068c4068ec
fix atari/mujoco env (#7 )
Minghao Zhang
2020-03-28 12:03:49 +08:00
c42990c725
add rllib result and fix pep8
Trinkle23897
2020-03-28 09:43:35 +08:00
77068af526
add examples, fix some bugs (#5 )
Minghao Zhang
2020-03-28 07:27:18 +08:00
acb93502cf
Update README.md
sproblvem
2020-03-27 16:52:07 +08:00
044aae4355
add baseline and rlpyt result
Trinkle23897
2020-03-27 16:24:07 +08:00
44f911bc31
add pytorch drl result
Trinkle23897
2020-03-27 09:04:29 +08:00
519f9f20d0
update readme
Trinkle23897
2020-03-26 17:32:51 +08:00
c505cd8205
update readme
Trinkle23897
2020-03-26 11:42:34 +08:00
3c0a09fefd
minor reformat (#2 )
Minghao Zhang
2020-03-26 09:01:20 +08:00
fdc969b830
fix collector
Trinkle23897
2020-03-25 14:08:28 +08:00
e95218e295
sac
Trinkle23897
2020-03-23 17:17:41 +08:00
30a0fc079c
td3
Trinkle23897
2020-03-23 11:34:52 +08:00
a87563b8e6
add demo of ppo continuous action task
Trinkle23897
2020-03-21 17:04:42 +08:00
c173f7bfbc
fix ddpg
Trinkle23897
2020-03-21 15:31:31 +08:00
8bd8246b16
refract test code
Trinkle23897
2020-03-21 10:58:01 +08:00
d64d78d769
seed???
Trinkle23897
2020-03-20 21:51:09 +08:00
75364cd986
ppo and early stop
Trinkle23897
2020-03-20 19:52:29 +08:00
c87fe3c18c
add trainer
Trinkle23897
2020-03-19 17:23:46 +08:00
9c5417dd51
change env to vecenv for higher code coverage rate
Trinkle23897
2020-03-18 21:56:03 +08:00
64bab0b6a0
ddpg
Trinkle23897
2020-03-18 21:45:41 +08:00
6e563fe61a
a2c
Trinkle23897
2020-03-17 20:22:37 +08:00
fd621971e5
fix bug in test
Trinkle23897
2020-03-17 15:16:30 +08:00
39de63592f
finish pg
Trinkle23897
2020-03-17 11:37:31 +08:00
8b0b970c9b
add speed stat
Trinkle23897
2020-03-16 15:04:58 +08:00
cef5de8b83
fix some bugs
Trinkle23897
2020-03-16 11:11:29 +08:00
5983c6b33d
finish dqn
Trinkle23897
2020-03-15 17:41:00 +08:00
c804662457
add cache buf in collector
Trinkle23897
2020-03-14 21:48:31 +08:00
543e57cdbd
clear
Trinkle23897
2020-03-13 21:47:17 +08:00
f16e05c0e7
maybe finished collector?
Trinkle23897
2020-03-13 17:49:22 +08:00
f58c1397c6
half of collector
Trinkle23897
2020-03-12 22:20:33 +08:00
4a1a7dd670
fix a bug
Trinkle23897
2020-03-11 18:02:19 +08:00
6632e47b9d
add test_buffer
Trinkle23897
2020-03-11 17:28:51 +08:00
04557fdb82
env test \ ray
Trinkle23897
2020-03-11 16:14:53 +08:00
7533e5b0ac
add first test
Trinkle23897
2020-03-11 10:56:38 +08:00
5550aed0a1
flake8 fix
Trinkle23897
2020-03-11 09:38:14 +08:00
776acd9f13
github ci
Trinkle23897
2020-03-11 09:18:28 +08:00
0dfb900e29
env and data
Trinkle23897
2020-03-11 09:09:56 +08:00
0c944eab68
init
Trinkle23897
2020-03-09 11:38:04 +08:00
bdd85f8a27
stop gradient in policy/distributional
priv
haoshengzou
2018-12-24 09:06:59 +08:00
909dc786d1
advantage estimation function all take my_feed_dict (all examples runnable); such requirement should be made a signature
haoshengzou
2018-11-22 08:03:03 +08:00
c937630bd3
add some my_feed_dict in advantage_estimation and data_collector
haoshengzou
2018-08-16 16:20:14 +08:00
a791916fc4
add clear() for replay_buffer
haoshengzou
2018-08-15 09:53:46 +08:00
00d4cb0fca
merging
haoshengzou
2018-06-15 18:46:45 +08:00
f8c359b094
add dqn and ppo examples, bit clean-up
haoshengzou
2018-06-14 11:18:39 +08:00
99da5619e5
fix code example in tutorial. leave render to be future work
haoshengzou
2018-05-29 11:04:32 +08:00
6f206759ab
add __all__
haoshengzou
2018-05-20 22:36:04 +08:00
eb8c82636e
setup.py, now "pip install"-able
haoshengzou
2018-04-17 06:34:38 +08:00
2527030838
fix the bug of unnamed_dict.update()
. import cleaning in examples/*.py
haoshengzou
2018-04-16 20:17:41 +08:00
d84c9d121c
first master version
haoshengzou
2018-04-16 18:02:00 +08:00
5f979caf58
finish all API docs, first version.
haoshengzou
2018-04-15 17:41:43 +08:00
8c108174b6
some more API docs
haoshengzou
2018-04-15 11:46:46 +08:00
9186dae6a3
more API docs
haoshengzou
2018-04-15 09:35:31 +08:00
2a3bc3ef35
part of API doc
haoshengzou
2018-04-12 21:10:50 +08:00
03246f7ded
functional code freeze. all examples working. prepare to release.
haoshengzou
2018-04-11 14:23:40 +08:00
739d360d9d
fix episode_cutoff
haoshengzou
2018-03-31 19:26:48 +08:00
ace59787ed
Merge remote-tracking branch 'origin/master'
haoshengzou
2018-03-28 18:47:54 +08:00
75e7f14051
towards ddpg
haoshengzou
2018-03-28 18:47:41 +08:00
07099654bd
a bash file for training
rtz19970824
2018-03-21 16:11:17 +08:00
f70dfb0559
clean code
rtz19970824
2018-03-14 19:17:28 +08:00
52e6b09768
finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!
haoshengzou
2018-03-11 17:47:42 +08:00
a86354834c
actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.
haoshengzou
2018-03-11 15:07:41 +08:00
498b55c051
ppo with batch also works! now ppo improves steadily, dqn not so stable.
haoshengzou
2018-03-10 17:30:11 +08:00
6eb69c7867
Merge remote-tracking branch 'origin/master'
haoshengzou
2018-03-09 15:10:10 +08:00
33094eab1d
delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks.
haoshengzou
2018-03-09 15:09:14 +08:00
92894d3853
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
haoshengzou
2018-03-09 15:07:14 +08:00
905d12bfa2
working on tester
haoshengzou
2018-03-09 09:25:19 +08:00
e68dcd3c64
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
haoshengzou
2018-03-08 16:51:12 +08:00
24d75fd1aa
call nstep_q_return from dqn_replay.py, still need test
Dong Yan
2018-03-06 20:48:07 +08:00
2a2274aeea
initial data_collector. working on examples/dqn_replay.py to run
haoshengzou
2018-03-04 21:29:58 +08:00
54a7b1343d
design exploration and evaluators for off-policy algos
haoshengzou
2018-03-04 13:53:29 +08:00
2eb056a721
Merge branch 'master' of github.com:sproblvem/tianshou
Dong Yan
2018-03-03 21:30:15 +08:00
0cf2fd6c53
an initial version of untested replaymemory qreturn
Dong Yan
2018-03-03 21:25:29 +08:00
e302fd87fb
vanilla replay buffer finished and tested. working on data_collector.
haoshengzou
2018-03-03 20:42:34 +08:00
528c4be93c
add render option for ddpg
Dong Yan
2018-02-28 18:44:06 +08:00
5ab2fa3b65
minor fixes
haoshengzou
2018-02-27 14:46:02 +08:00
675057c6b9
interfaces for advantage_estimation. full_return finished and tested.
haoshengzou
2018-02-27 14:11:52 +08:00
25b25ce7d8
Merge branch 'master' of https://github.com/sproblvem/tianshou
songshshshsh
2018-02-27 13:15:36 +08:00
67d0e78ab9
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
songshshshsh
2018-02-27 13:10:47 +08:00