Tianshou

Author	SHA1	Message	Date
haoshengzou	9186dae6a3	more API docs	2018-04-15 09:35:31 +08:00
haoshengzou	2a3bc3ef35	part of API doc	2018-04-12 21:10:50 +08:00
haoshengzou	03246f7ded	functional code freeze. all examples working. prepare to release.	2018-04-11 14:23:40 +08:00
haoshengzou	739d360d9d	fix episode_cutoff	2018-03-31 19:26:48 +08:00
haoshengzou	ace59787ed	Merge remote-tracking branch 'origin/master'	2018-03-28 18:47:54 +08:00
haoshengzou	75e7f14051	towards ddpg	2018-03-28 18:47:41 +08:00
rtz19970824	07099654bd	a bash file for training	2018-03-21 16:11:17 +08:00
rtz19970824	f70dfb0559	clean code	2018-03-14 19:17:28 +08:00
haoshengzou	52e6b09768	finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!	2018-03-11 17:47:42 +08:00
haoshengzou	a86354834c	actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.	2018-03-11 15:07:41 +08:00
haoshengzou	498b55c051	ppo with batch also works! now ppo improves steadily, dqn not so stable.	2018-03-10 17:30:11 +08:00
haoshengzou	6eb69c7867	Merge remote-tracking branch 'origin/master' Conflicts: tianshou/data/tester.py	2018-03-09 15:10:10 +08:00
haoshengzou	33094eab1d	delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks.	2018-03-09 15:09:14 +08:00
haoshengzou	92894d3853	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-09 15:07:14 +08:00
haoshengzou	905d12bfa2	working on tester	2018-03-09 09:25:19 +08:00
haoshengzou	e68dcd3c64	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-08 16:51:12 +08:00
Dong Yan	24d75fd1aa	call nstep_q_return from dqn_replay.py, still need test	2018-03-06 20:48:07 +08:00
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
haoshengzou	54a7b1343d	design exploration and evaluators for off-policy algos	2018-03-04 13:53:29 +08:00
Dong Yan	2eb056a721	Merge branch 'master' of github.com:sproblvem/tianshou	2018-03-03 21:30:15 +08:00
Dong Yan	0cf2fd6c53	an initial version of untested replaymemory qreturn	2018-03-03 21:25:29 +08:00
haoshengzou	e302fd87fb	vanilla replay buffer finished and tested. working on data_collector.	2018-03-03 20:42:34 +08:00
Dong Yan	528c4be93c	add render option for ddpg	2018-02-28 18:44:06 +08:00
haoshengzou	5ab2fa3b65	minor fixes	2018-02-27 14:46:02 +08:00
haoshengzou	675057c6b9	interfaces for advantage_estimation. full_return finished and tested.	2018-02-27 14:11:52 +08:00
songshshshsh	25b25ce7d8	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-02-27 13:15:36 +08:00
songshshshsh	67d0e78ab9	first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing	2018-02-27 13:13:38 +08:00
haoshengzou	40190a282e	Merge remote-tracking branch 'origin/master' # Conflicts: # README.md	2018-02-26 11:48:46 +08:00
haoshengzou	87889d766c	minor fixes. proceed to refactor replay to use lists as in batch.	2018-02-26 11:47:02 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
rtz19970824	a40e5aec54	modified README	2018-02-24 16:26:19 +08:00
Dong Yan	f3aee448e0	add option to show the running result of cartpole	2018-02-24 10:53:39 +08:00
Dong Yan	764f7fb5f1	minor fix of play.py	2018-02-23 23:15:04 +08:00
sproblvem	a0849fa213	Merge pull request #5 from sproblvem/union_set add union set for do_move and is_valid The modify on play.py should be removed, I will fix it on latter commit	2018-02-23 15:01:17 +08:00
sproblvem	7711686bc6	Update README.md add the dependency	2018-02-12 15:28:25 +08:00
Dong Yan	2163d18728	fix the env -> self._env bug	2018-02-10 03:42:00 +08:00
Dong Yan	50b2d98d0a	support ctrl-c to terminate play.py	2018-02-08 21:17:56 +08:00
haoshengzou	e6d477f9a3	modified top-level .gitignore to include tianshou/data	2018-01-25 16:08:04 +08:00
haoshengzou	b8568c6af4	added data/utils.py. was ignored by .gitignore before...	2018-01-25 10:15:38 +08:00
haoshengzou	5910e08672	data/utils.py added but not pushed...	2018-01-25 10:11:36 +08:00
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
Wenbo	0131bcdc85	fix minor	2018-01-17 15:57:41 +08:00
Wenbo	0e4aa44ebb	add deepcopy for hash, add some testing	2018-01-17 15:54:46 +08:00
haoshengzou	9f96cc2461	finish design and running of ppo and actor-critic. advantage estimation module is not complete yet.	2018-01-17 14:21:50 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
Wenbo Hu	e76ccaee80	add union set for do_move and is_valid	2018-01-16 14:10:56 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00

1 2 3 4 5 ...

368 Commits