381 Commits

Author SHA1 Message Date
haoshengzou
bdd85f8a27 stop gradient in policy/distributional 2018-12-24 09:06:59 +08:00
haoshengzou
909dc786d1 advantage estimation function all take my_feed_dict (all examples runnable); such requirement should be made a signature 2018-11-22 08:03:03 +08:00
haoshengzou
c937630bd3 add some my_feed_dict in advantage_estimation and data_collector 2018-08-16 16:20:14 +08:00
haoshengzou
a791916fc4 add clear() for replay_buffer 2018-08-15 09:53:46 +08:00
haoshengzou
00d4cb0fca merging 2018-06-15 18:46:45 +08:00
haoshengzou
f8c359b094 add dqn and ppo examples, bit clean-up 2018-06-14 11:18:39 +08:00
haoshengzou
99da5619e5 fix code example in tutorial. leave render to be future work 2018-05-29 11:04:32 +08:00
haoshengzou
6f206759ab add __all__ 2018-05-20 22:36:04 +08:00
haoshengzou
eb8c82636e setup.py, now "pip install"-able 2018-04-17 06:34:38 +08:00
haoshengzou
2527030838 fix the bug of unnamed_dict.update(). import cleaning in examples/*.py 2018-04-16 20:17:41 +08:00
haoshengzou
d84c9d121c first master version 2018-04-16 18:02:00 +08:00
haoshengzou
5f979caf58 finish all API docs, first version. 2018-04-15 17:41:43 +08:00
haoshengzou
8c108174b6 some more API docs 2018-04-15 11:46:46 +08:00
haoshengzou
9186dae6a3 more API docs 2018-04-15 09:35:31 +08:00
haoshengzou
2a3bc3ef35 part of API doc 2018-04-12 21:10:50 +08:00
haoshengzou
03246f7ded functional code freeze. all examples working. prepare to release. 2018-04-11 14:23:40 +08:00
haoshengzou
739d360d9d fix episode_cutoff 2018-03-31 19:26:48 +08:00
haoshengzou
ace59787ed Merge remote-tracking branch 'origin/master' 2018-03-28 18:47:54 +08:00
haoshengzou
75e7f14051 towards ddpg 2018-03-28 18:47:41 +08:00
rtz19970824
07099654bd a bash file for training 2018-03-21 16:11:17 +08:00
rtz19970824
f70dfb0559 clean code 2018-03-14 19:17:28 +08:00
haoshengzou
52e6b09768 finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check! 2018-03-11 17:47:42 +08:00
haoshengzou
a86354834c actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow. 2018-03-11 15:07:41 +08:00
haoshengzou
498b55c051 ppo with batch also works! now ppo improves steadily, dqn not so stable. 2018-03-10 17:30:11 +08:00
haoshengzou
6eb69c7867 Merge remote-tracking branch 'origin/master'
Conflicts:
	tianshou/data/tester.py
2018-03-09 15:10:10 +08:00
haoshengzou
33094eab1d delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks. 2018-03-09 15:09:14 +08:00
haoshengzou
92894d3853 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-09 15:07:14 +08:00
haoshengzou
905d12bfa2 working on tester 2018-03-09 09:25:19 +08:00
haoshengzou
e68dcd3c64 working on off-policy test. other parts of dqn_replay is runnable, but performance not tested. 2018-03-08 16:51:12 +08:00
Dong Yan
24d75fd1aa call nstep_q_return from dqn_replay.py, still need test 2018-03-06 20:48:07 +08:00
haoshengzou
2a2274aeea initial data_collector. working on examples/dqn_replay.py to run 2018-03-04 21:29:58 +08:00
haoshengzou
54a7b1343d design exploration and evaluators for off-policy algos 2018-03-04 13:53:29 +08:00
Dong Yan
2eb056a721 Merge branch 'master' of github.com:sproblvem/tianshou 2018-03-03 21:30:15 +08:00
Dong Yan
0cf2fd6c53 an initial version of untested replaymemory qreturn 2018-03-03 21:25:29 +08:00
haoshengzou
e302fd87fb vanilla replay buffer finished and tested. working on data_collector. 2018-03-03 20:42:34 +08:00
Dong Yan
528c4be93c add render option for ddpg 2018-02-28 18:44:06 +08:00
haoshengzou
5ab2fa3b65 minor fixes 2018-02-27 14:46:02 +08:00
haoshengzou
675057c6b9 interfaces for advantage_estimation. full_return finished and tested. 2018-02-27 14:11:52 +08:00
songshshshsh
25b25ce7d8 Merge branch 'master' of https://github.com/sproblvem/tianshou 2018-02-27 13:15:36 +08:00
songshshshsh
67d0e78ab9 first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing 2018-02-27 13:13:38 +08:00
haoshengzou
40190a282e Merge remote-tracking branch 'origin/master'
# Conflicts:
#	README.md
2018-02-26 11:48:46 +08:00
haoshengzou
87889d766c minor fixes. proceed to refactor replay to use lists as in batch. 2018-02-26 11:47:02 +08:00
Dong Yan
0bc1b63e38 add epsilon-greedy for dqn 2018-02-25 16:31:35 +08:00
rtz19970824
a40e5aec54 modified README 2018-02-24 16:26:19 +08:00
Dong Yan
f3aee448e0 add option to show the running result of cartpole 2018-02-24 10:53:39 +08:00
Dong Yan
764f7fb5f1 minor fix of play.py 2018-02-23 23:15:04 +08:00
sproblvem
a0849fa213
Merge pull request #5 from sproblvem/union_set
add union set for do_move and is_valid
The modify on play.py should be removed, I will fix it on latter commit
2018-02-23 15:01:17 +08:00
sproblvem
7711686bc6
Update README.md
add the dependency
2018-02-12 15:28:25 +08:00
Dong Yan
2163d18728 fix the env -> self._env bug 2018-02-10 03:42:00 +08:00
Dong Yan
50b2d98d0a support ctrl-c to terminate play.py 2018-02-08 21:17:56 +08:00