haoshengzou
|
9186dae6a3
|
more API docs
|
2018-04-15 09:35:31 +08:00 |
|
haoshengzou
|
2a3bc3ef35
|
part of API doc
|
2018-04-12 21:10:50 +08:00 |
|
haoshengzou
|
03246f7ded
|
functional code freeze. all examples working. prepare to release.
|
2018-04-11 14:23:40 +08:00 |
|
haoshengzou
|
739d360d9d
|
fix episode_cutoff
|
2018-03-31 19:26:48 +08:00 |
|
haoshengzou
|
ace59787ed
|
Merge remote-tracking branch 'origin/master'
|
2018-03-28 18:47:54 +08:00 |
|
haoshengzou
|
75e7f14051
|
towards ddpg
|
2018-03-28 18:47:41 +08:00 |
|
rtz19970824
|
07099654bd
|
a bash file for training
|
2018-03-21 16:11:17 +08:00 |
|
rtz19970824
|
f70dfb0559
|
clean code
|
2018-03-14 19:17:28 +08:00 |
|
haoshengzou
|
52e6b09768
|
finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!
|
2018-03-11 17:47:42 +08:00 |
|
haoshengzou
|
a86354834c
|
actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.
|
2018-03-11 15:07:41 +08:00 |
|
haoshengzou
|
498b55c051
|
ppo with batch also works! now ppo improves steadily, dqn not so stable.
|
2018-03-10 17:30:11 +08:00 |
|
haoshengzou
|
6eb69c7867
|
Merge remote-tracking branch 'origin/master'
Conflicts:
tianshou/data/tester.py
|
2018-03-09 15:10:10 +08:00 |
|
haoshengzou
|
33094eab1d
|
delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks.
|
2018-03-09 15:09:14 +08:00 |
|
haoshengzou
|
92894d3853
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-09 15:07:14 +08:00 |
|
haoshengzou
|
905d12bfa2
|
working on tester
|
2018-03-09 09:25:19 +08:00 |
|
haoshengzou
|
e68dcd3c64
|
working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.
|
2018-03-08 16:51:12 +08:00 |
|
Dong Yan
|
24d75fd1aa
|
call nstep_q_return from dqn_replay.py, still need test
|
2018-03-06 20:48:07 +08:00 |
|
haoshengzou
|
2a2274aeea
|
initial data_collector. working on examples/dqn_replay.py to run
|
2018-03-04 21:29:58 +08:00 |
|
haoshengzou
|
54a7b1343d
|
design exploration and evaluators for off-policy algos
|
2018-03-04 13:53:29 +08:00 |
|
Dong Yan
|
2eb056a721
|
Merge branch 'master' of github.com:sproblvem/tianshou
|
2018-03-03 21:30:15 +08:00 |
|
Dong Yan
|
0cf2fd6c53
|
an initial version of untested replaymemory qreturn
|
2018-03-03 21:25:29 +08:00 |
|
haoshengzou
|
e302fd87fb
|
vanilla replay buffer finished and tested. working on data_collector.
|
2018-03-03 20:42:34 +08:00 |
|
Dong Yan
|
528c4be93c
|
add render option for ddpg
|
2018-02-28 18:44:06 +08:00 |
|
haoshengzou
|
5ab2fa3b65
|
minor fixes
|
2018-02-27 14:46:02 +08:00 |
|
haoshengzou
|
675057c6b9
|
interfaces for advantage_estimation. full_return finished and tested.
|
2018-02-27 14:11:52 +08:00 |
|
songshshshsh
|
25b25ce7d8
|
Merge branch 'master' of https://github.com/sproblvem/tianshou
|
2018-02-27 13:15:36 +08:00 |
|
songshshshsh
|
67d0e78ab9
|
first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing
|
2018-02-27 13:13:38 +08:00 |
|
haoshengzou
|
40190a282e
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# README.md
|
2018-02-26 11:48:46 +08:00 |
|
haoshengzou
|
87889d766c
|
minor fixes. proceed to refactor replay to use lists as in batch.
|
2018-02-26 11:47:02 +08:00 |
|
Dong Yan
|
0bc1b63e38
|
add epsilon-greedy for dqn
|
2018-02-25 16:31:35 +08:00 |
|
rtz19970824
|
a40e5aec54
|
modified README
|
2018-02-24 16:26:19 +08:00 |
|
Dong Yan
|
f3aee448e0
|
add option to show the running result of cartpole
|
2018-02-24 10:53:39 +08:00 |
|
Dong Yan
|
764f7fb5f1
|
minor fix of play.py
|
2018-02-23 23:15:04 +08:00 |
|
sproblvem
|
a0849fa213
|
Merge pull request #5 from sproblvem/union_set
add union set for do_move and is_valid
The modify on play.py should be removed, I will fix it on latter commit
|
2018-02-23 15:01:17 +08:00 |
|
sproblvem
|
7711686bc6
|
Update README.md
add the dependency
|
2018-02-12 15:28:25 +08:00 |
|
Dong Yan
|
2163d18728
|
fix the env -> self._env bug
|
2018-02-10 03:42:00 +08:00 |
|
Dong Yan
|
50b2d98d0a
|
support ctrl-c to terminate play.py
|
2018-02-08 21:17:56 +08:00 |
|
haoshengzou
|
e6d477f9a3
|
modified top-level .gitignore to include tianshou/data
|
2018-01-25 16:08:04 +08:00 |
|
haoshengzou
|
b8568c6af4
|
added data/utils.py. was ignored by .gitignore before...
|
2018-01-25 10:15:38 +08:00 |
|
haoshengzou
|
5910e08672
|
data/utils.py added but not pushed...
|
2018-01-25 10:11:36 +08:00 |
|
haoshengzou
|
f32e1d9c12
|
finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.
|
2018-01-18 17:38:52 +08:00 |
|
haoshengzou
|
8fbde8283f
|
finish dqn example. advantage estimation module is not complete yet.
|
2018-01-18 12:19:48 +08:00 |
|
Wenbo
|
0131bcdc85
|
fix minor
|
2018-01-17 15:57:41 +08:00 |
|
Wenbo
|
0e4aa44ebb
|
add deepcopy for hash, add some testing
|
2018-01-17 15:54:46 +08:00 |
|
haoshengzou
|
9f96cc2461
|
finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.
|
2018-01-17 14:21:50 +08:00 |
|
haoshengzou
|
ed25bf7586
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-17 11:55:51 +08:00 |
|
Wenbo Hu
|
e76ccaee80
|
add union set for do_move and is_valid
|
2018-01-16 14:10:56 +08:00 |
|
haoshengzou
|
d599506dc9
|
fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.
|
2018-01-15 16:32:30 +08:00 |
|
haoshengzou
|
983cd36074
|
finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.
|
2018-01-15 00:03:06 +08:00 |
|
haoshengzou
|
fed3bf2a12
|
auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.
|
2018-01-14 20:58:28 +08:00 |
|