haoshengzou
|
5f979caf58
|
finish all API docs, first version.
|
2018-04-15 17:41:43 +08:00 |
|
haoshengzou
|
8c108174b6
|
some more API docs
|
2018-04-15 11:46:46 +08:00 |
|
haoshengzou
|
9186dae6a3
|
more API docs
|
2018-04-15 09:35:31 +08:00 |
|
haoshengzou
|
2a3bc3ef35
|
part of API doc
|
2018-04-12 21:10:50 +08:00 |
|
haoshengzou
|
03246f7ded
|
functional code freeze. all examples working. prepare to release.
|
2018-04-11 14:23:40 +08:00 |
|
haoshengzou
|
739d360d9d
|
fix episode_cutoff
|
2018-03-31 19:26:48 +08:00 |
|
haoshengzou
|
ace59787ed
|
Merge remote-tracking branch 'origin/master'
|
2018-03-28 18:47:54 +08:00 |
|
haoshengzou
|
75e7f14051
|
towards ddpg
|
2018-03-28 18:47:41 +08:00 |
|
rtz19970824
|
07099654bd
|
a bash file for training
|
2018-03-21 16:11:17 +08:00 |
|
rtz19970824
|
f70dfb0559
|
clean code
|
2018-03-14 19:17:28 +08:00 |
|
haoshengzou
|
52e6b09768
|
finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!
|
2018-03-11 17:47:42 +08:00 |
|
haoshengzou
|
a86354834c
|
actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.
|
2018-03-11 15:07:41 +08:00 |
|
haoshengzou
|
498b55c051
|
ppo with batch also works! now ppo improves steadily, dqn not so stable.
|
2018-03-10 17:30:11 +08:00 |
|
haoshengzou
|
6eb69c7867
|
Merge remote-tracking branch 'origin/master'
Conflicts:
tianshou/data/tester.py
|
2018-03-09 15:10:10 +08:00 |
|
haoshengzou
|
33094eab1d
|
delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks.
|
2018-03-09 15:09:14 +08:00 |
|
haoshengzou
|
92894d3853
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-09 15:07:14 +08:00 |
|
haoshengzou
|
905d12bfa2
|
working on tester
|
2018-03-09 09:25:19 +08:00 |
|
haoshengzou
|
e68dcd3c64
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-08 16:51:12 +08:00 |
|
Dong Yan
|
24d75fd1aa
|
call nstep_q_return from dqn_replay.py, still need test
|
2018-03-06 20:48:07 +08:00 |
|
haoshengzou
|
2a2274aeea
|
initial data_collector. working on examples/dqn_replay.py to run
|
2018-03-04 21:29:58 +08:00 |
|
haoshengzou
|
54a7b1343d
|
design exploration and evaluators for off-policy algos
|
2018-03-04 13:53:29 +08:00 |
|
Dong Yan
|
2eb056a721
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2018-03-03 21:30:15 +08:00 |
|
Dong Yan
|
0cf2fd6c53
|
an initial version of untested replaymemory qreturn
|
2018-03-03 21:25:29 +08:00 |
|
haoshengzou
|
e302fd87fb
|
vanilla replay buffer finished and tested. working on data_collector.
|
2018-03-03 20:42:34 +08:00 |
|
Dong Yan
|
528c4be93c
|
add render option for ddpg
|
2018-02-28 18:44:06 +08:00 |
|
haoshengzou
|
5ab2fa3b65
|
minor fixes
|
2018-02-27 14:46:02 +08:00 |
|
haoshengzou
|
675057c6b9
|
interfaces for advantage_estimation. full_return finished and tested.
|
2018-02-27 14:11:52 +08:00 |
|
songshshshsh
|
25b25ce7d8
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-02-27 13:15:36 +08:00 |
|
songshshshsh
|
67d0e78ab9
|
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
|
2018-02-27 13:13:38 +08:00 |
|
haoshengzou
|
40190a282e
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# README.md
|
2018-02-26 11:48:46 +08:00 |
|
haoshengzou
|
87889d766c
|
minor fixes. proceed to refactor replay to use lists as in batch.
|
2018-02-26 11:47:02 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
rtz19970824
|
a40e5aec54
|
modified README
|
2018-02-24 16:26:19 +08:00 |
|
Dong Yan
|
f3aee448e0
|
add option to show the running result of cartpole
|
2018-02-24 10:53:39 +08:00 |
|
Dong Yan
|
764f7fb5f1
|
minor fix of play.py
|
2018-02-23 23:15:04 +08:00 |
|
sproblvem
|
a0849fa213
|
Merge pull request #5 from sproblvem/union_set
add union set for do_move and is_valid
The modify on play.py should be removed, I will fix it on latter commit
|
2018-02-23 15:01:17 +08:00 |
|
sproblvem
|
7711686bc6
|
Update README.md
add the dependency
|
2018-02-12 15:28:25 +08:00 |
|
Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
Dong Yan
|
50b2d98d0a
|
support ctrl-c to terminate play.py
|
2018-02-08 21:17:56 +08:00 |
|
haoshengzou
|
e6d477f9a3
|
modified top-level .gitignore to include tianshou/data
|
2018-01-25 16:08:04 +08:00 |
|
haoshengzou
|
b8568c6af4
|
added data/utils.py. was ignored by .gitignore before...
|
2018-01-25 10:15:38 +08:00 |
|
haoshengzou
|
5910e08672
|
data/utils.py added but not pushed...
|
2018-01-25 10:11:36 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
Wenbo
|
0131bcdc85
|
fix minor
|
2018-01-17 15:57:41 +08:00 |
|
Wenbo
|
0e4aa44ebb
|
add deepcopy for hash, add some testing
|
2018-01-17 15:54:46 +08:00 |
|
haoshengzou
|
9f96cc2461
|
finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.
|
2018-01-17 14:21:50 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
Wenbo Hu
|
e76ccaee80
|
add union set for do_move and is_valid
|
2018-01-16 14:10:56 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|