Tianshou

Author	SHA1	Message	Date
haoshengzou	c937630bd3	add some my_feed_dict in advantage_estimation and data_collector	2018-08-16 16:20:14 +08:00
haoshengzou	a791916fc4	add clear() for replay_buffer	2018-08-15 09:53:46 +08:00
haoshengzou	6f206759ab	add __all__	2018-05-20 22:36:04 +08:00
haoshengzou	eb8c82636e	setup.py, now "pip install"-able	2018-04-17 06:34:38 +08:00
haoshengzou	2527030838	fix the bug of `unnamed_dict.update()`. import cleaning in examples/*.py	2018-04-16 20:17:41 +08:00
haoshengzou	d84c9d121c	first master version	2018-04-16 18:02:00 +08:00
haoshengzou	5f979caf58	finish all API docs, first version.	2018-04-15 17:41:43 +08:00
haoshengzou	8c108174b6	some more API docs	2018-04-15 11:46:46 +08:00
haoshengzou	9186dae6a3	more API docs	2018-04-15 09:35:31 +08:00
haoshengzou	2a3bc3ef35	part of API doc	2018-04-12 21:10:50 +08:00
haoshengzou	03246f7ded	functional code freeze. all examples working. prepare to release.	2018-04-11 14:23:40 +08:00
haoshengzou	739d360d9d	fix episode_cutoff	2018-03-31 19:26:48 +08:00
haoshengzou	75e7f14051	towards ddpg	2018-03-28 18:47:41 +08:00
haoshengzou	52e6b09768	finish ddpg. now ppo, actor-critic, dqn works. ddpg is not working, check!	2018-03-11 17:47:42 +08:00
haoshengzou	a86354834c	actor critic also works. fix some bugs in nstep_q_return. dqn still trains slow.	2018-03-11 15:07:41 +08:00
haoshengzou	498b55c051	ppo with batch also works! now ppo improves steadily, dqn not so stable.	2018-03-10 17:30:11 +08:00
haoshengzou	92894d3853	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-09 15:07:14 +08:00
haoshengzou	e68dcd3c64	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-08 16:51:12 +08:00
Dong Yan	24d75fd1aa	call nstep_q_return from dqn_replay.py, still need test	2018-03-06 20:48:07 +08:00
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
haoshengzou	54a7b1343d	design exploration and evaluators for off-policy algos	2018-03-04 13:53:29 +08:00
Dong Yan	2eb056a721	Merge branch 'master' of github.com:sproblvem/tianshou	2018-03-03 21:30:15 +08:00
Dong Yan	0cf2fd6c53	an initial version of untested replaymemory qreturn	2018-03-03 21:25:29 +08:00
haoshengzou	e302fd87fb	vanilla replay buffer finished and tested. working on data_collector.	2018-03-03 20:42:34 +08:00
haoshengzou	5ab2fa3b65	minor fixes	2018-02-27 14:46:02 +08:00
haoshengzou	675057c6b9	interfaces for advantage_estimation. full_return finished and tested.	2018-02-27 14:11:52 +08:00
songshshshsh	25b25ce7d8	Merge branch 'master' of https://github.com/sproblvem/tianshou	2018-02-27 13:15:36 +08:00
songshshshsh	67d0e78ab9	first modify of replay buffer, make all three replay buffers work, wait for refactoring and testing	2018-02-27 13:13:38 +08:00
haoshengzou	40190a282e	Merge remote-tracking branch 'origin/master' # Conflicts: # README.md	2018-02-26 11:48:46 +08:00
haoshengzou	87889d766c	minor fixes. proceed to refactor replay to use lists as in batch.	2018-02-26 11:47:02 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
Dong Yan	f3aee448e0	add option to show the running result of cartpole	2018-02-24 10:53:39 +08:00
Dong Yan	2163d18728	fix the env -> self._env bug	2018-02-10 03:42:00 +08:00
haoshengzou	b8568c6af4	added data/utils.py. was ignored by .gitignore before...	2018-01-25 10:15:38 +08:00
haoshengzou	5910e08672	data/utils.py added but not pushed...	2018-01-25 10:11:36 +08:00
haoshengzou	f32e1d9c12	finish ddpg example. all examples under examples/ (except those containing 'contrib' and 'fail') can run! advantage estimation module is not complete yet.	2018-01-18 17:38:52 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	ed25bf7586	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-17 11:55:51 +08:00
haoshengzou	d599506dc9	fixed the bugs on Jan 14, which gives inferior or even no improvement. mistook group_ndims. policy will soon need refactoring.	2018-01-15 16:32:30 +08:00
haoshengzou	983cd36074	finished all ppo examples. Training is remarkably slower than the version before Jan 13. More strangely, in the gym example there's almost no improvement... but this problem comes behind design. I'll first write actor-critic.	2018-01-15 00:03:06 +08:00
haoshengzou	fed3bf2a12	auto target network. ppo_cartpole.py run ok. but results is different from previous version even with the same random seed, still needs debugging.	2018-01-14 20:58:28 +08:00
JialianLee	8b7b4b6c6b	Add dirichlet noise to root prior and add uniform noise to initial Q value	2018-01-05 17:02:19 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	4333ee5d39	ppo_cartpole.py seems to be working with param: bs128, num_ep20, max_time500; manually merged Normal from branch policy_wrapper	2018-01-02 19:40:37 +08:00
JialianLee	5849776c9a	Modification and doc for unit test	2017-12-29 13:45:53 +08:00
rtz19970824	01f39f40d3	debug for unit test	2017-12-28 19:38:25 +08:00
JialianLee	4140d8c9d2	Modification on unit test	2017-12-28 17:10:25 +08:00
JialianLee	0352866b1a	Modification for game engine	2017-12-28 16:27:28 +08:00
JialianLee	5457e5134e	add a unit test	2017-12-28 16:20:44 +08:00
Dong Yan	08b6649fea	test next_action.next_state in MCTS	2017-12-28 15:52:31 +08:00

1 2 3

117 Commits