Tianshou

Author	SHA1	Message	Date
haoshengzou	bdd85f8a27	stop gradient in policy/distributional	2018-12-24 09:06:59 +08:00
haoshengzou	909dc786d1	advantage estimation function all take my_feed_dict (all examples runnable); such requirement should be made a signature	2018-11-22 08:03:03 +08:00
haoshengzou	c937630bd3	add some my_feed_dict in advantage_estimation and data_collector	2018-08-16 16:20:14 +08:00
haoshengzou	a791916fc4	add clear() for replay_buffer	2018-08-15 09:53:46 +08:00
haoshengzou	00d4cb0fca	merging	2018-06-15 18:46:45 +08:00
haoshengzou	f8c359b094	add dqn and ppo examples, bit clean-up	2018-06-14 11:18:39 +08:00
haoshengzou	99da5619e5	fix code example in tutorial. leave render to be future work	2018-05-29 11:04:32 +08:00
haoshengzou	6f206759ab	add __all__	2018-05-20 22:36:04 +08:00
haoshengzou	eb8c82636e	setup.py, now "pip install"-able	2018-04-17 06:34:38 +08:00
haoshengzou	2527030838	fix the bug of `unnamed_dict.update()`. import cleaning in examples/*.py	2018-04-16 20:17:41 +08:00
haoshengzou	d84c9d121c	first master version	2018-04-16 18:02:00 +08:00
haoshengzou	5f979caf58	finish all API docs, first version.	2018-04-15 17:41:43 +08:00
haoshengzou	8c108174b6	some more API docs	2018-04-15 11:46:46 +08:00
haoshengzou	9186dae6a3	more API docs	2018-04-15 09:35:31 +08:00
haoshengzou	2a3bc3ef35	part of API doc	2018-04-12 21:10:50 +08:00
haoshengzou	03246f7ded	functional code freeze. all examples working. prepare to release.	2018-04-11 14:23:40 +08:00
haoshengzou	739d360d9d	fix episode_cutoff	2018-03-31 19:26:48 +08:00
haoshengzou	ace59787ed	Merge remote-tracking branch 'origin/master'	2018-03-28 18:47:54 +08:00
haoshengzou	75e7f14051	towards ddpg	2018-03-28 18:47:41 +08:00
rtz19970824	07099654bd	a bash file for training	2018-03-21 16:11:17 +08:00
rtz19970824	f70dfb0559	clean code	2018-03-14 19:17:28 +08:00
haoshengzou	52e6b09768	finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!	2018-03-11 17:47:42 +08:00
haoshengzou	a86354834c	actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.	2018-03-11 15:07:41 +08:00
haoshengzou	498b55c051	ppo with batch also works! now ppo improves steadily, dqn not so stable.	2018-03-10 17:30:11 +08:00
haoshengzou	6eb69c7867	Merge remote-tracking branch 'origin/master' Conflicts: tianshou/data/tester.py	2018-03-09 15:10:10 +08:00
haoshengzou	33094eab1d	delete contrib dqn example. tested dqn example, works to some extent! though learning speed and performance needs to be compared to other benchmarks.	2018-03-09 15:09:14 +08:00
haoshengzou	92894d3853	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-09 15:07:14 +08:00
haoshengzou	905d12bfa2	working on tester	2018-03-09 09:25:19 +08:00
haoshengzou	e68dcd3c64	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-08 16:51:12 +08:00
Dong Yan	24d75fd1aa	call nstep_q_return from dqn_replay.py, still need test	2018-03-06 20:48:07 +08:00
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
haoshengzou	54a7b1343d	design exploration and evaluators for off-policy algos	2018-03-04 13:53:29 +08:00
Dong Yan	2eb056a721	Merge branch 'master' of github.com:sproblvem/tianshou	2018-03-03 21:30:15 +08:00
Dong Yan	0cf2fd6c53	an initial version of untested replaymemory qreturn	2018-03-03 21:25:29 +08:00
haoshengzou	e302fd87fb	vanilla replay buffer finished and tested. working on data_collector.	2018-03-03 20:42:34 +08:00
Dong Yan	528c4be93c	add render option for ddpg	2018-02-28 18:44:06 +08:00
haoshengzou	5ab2fa3b65	minor fixes	2018-02-27 14:46:02 +08:00
haoshengzou	675057c6b9	interfaces for advantage_estimation. full_return finished and tested.	2018-02-27 14:11:52 +08:00
songshshshsh	25b25ce7d8	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-02-27 13:15:36 +08:00
songshshshsh	67d0e78ab9	first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing	2018-02-27 13:13:38 +08:00
haoshengzou	40190a282e	Merge remote-tracking branch 'origin/master' # Conflicts: # README.md	2018-02-26 11:48:46 +08:00
haoshengzou	87889d766c	minor fixes. proceed to refactor replay to use lists as in batch.	2018-02-26 11:47:02 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
rtz19970824	a40e5aec54	modified README	2018-02-24 16:26:19 +08:00
Dong Yan	f3aee448e0	add option to show the running result of cartpole	2018-02-24 10:53:39 +08:00
Dong Yan	764f7fb5f1	minor fix of play.py	2018-02-23 23:15:04 +08:00
sproblvem	a0849fa213	Merge pull request #5 from sproblvem/union_set add union set for do_move and is_valid The modify on play.py should be removed, I will fix it on latter commit	2018-02-23 15:01:17 +08:00
sproblvem	7711686bc6	Update README.md add the dependency	2018-02-12 15:28:25 +08:00
Dong Yan	2163d18728	fix the env -> self._env bug	2018-02-10 03:42:00 +08:00
Dong Yan	50b2d98d0a	support ctrl-c to terminate play.py	2018-02-08 21:17:56 +08:00

1 2 3 4 5 ...

381 Commits