Tianshou

Author	SHA1	Message	Date
youkaichao	e767de044b	Remove dummy net code (#123 ) * remove dummy net; delete two files * split code to have backbone and head * rename class * change torch.float to torch.float32 * use flatten(1) instead of view(batch, -1) * remove dummy net in docs * bugfix for rnn * fix cuda error * minor fix of docs * do not change the example code in dqn tutorial, since it is for demonstration Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-07-09 22:57:01 +08:00
danagi	13828f6309	added noise param to collector for test phase, fixed examples to adapt modification (#86 ) * Add auto alpha tuning and exploration noise for sac. Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature. * add exploration noise to collector, fix example to adapt modification	2020-06-23 07:20:51 +08:00
Trinkle23897	e8b44bbaf4	move sac_mcc to examples (runtime too long)	2020-06-22 21:39:00 +08:00
danagi	c59ad40aef	Add auto alpha tuning and exploration noise for sac. (#80 ) Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature.	2020-06-16 22:17:28 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Trinkle23897	ff81a18f42	compute_nstep_returns (item 2 of #51 )	2020-06-02 22:29:50 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
Imone	57bca16f94	Fix log_prob and PPO dual_clip (#49 ) * Added DiagGaussian to fix log_probg * Disable PPO dual_clip	2020-05-18 16:23:35 +08:00
Trinkle23897	70122dc03d	oinit with 0 bias	2020-05-17 17:06:20 +08:00
Trinkle23897	3271c92609	orthogonal init for ppo in test script	2020-05-16 20:27:01 +08:00
Trinkle23897	c2a7caf806	add recurrent actor and critic	2020-04-30 16:31:40 +08:00
Trinkle23897	815f3522bb	imitation with discrete action space	2020-04-20 11:25:20 +08:00
Trinkle23897	6bf1ea644d	fix ppo	2020-04-19 14:30:42 +08:00
Trinkle23897	680fc0ffbe	gae	2020-04-14 21:11:06 +08:00
Trinkle23897	7b65d43394	vanilla imitation learning	2020-04-13 19:37:27 +08:00
Trinkle23897	6a244d1fbb	save_fn	2020-04-11 16:54:27 +08:00
Oblivion	4d4d0daf9e	Performance improve (#18 ) * improve performance set one thread for NN replace detach() op with torch.no_grad() * fix pep 8 errors	2020-04-05 09:10:21 +08:00
Trinkle23897	974ade8019	add some docs	2020-04-03 21:28:12 +08:00
Trinkle23897	c42990c725	add rllib result and fix pep8	2020-03-28 09:43:35 +08:00
Minghao Zhang	77068af526	add examples, fix some bugs (#5 ) * update atari.py * fix setup.py pass the pytest * fix setup.py pass the pytest * add args "render" * change the tensorboard writter * change the tensorboard writter * change device, render, tensorboard log location * change device, render, tensorboard log location * remove some wrong local files * fix some tab mistakes and the envs name in continuous/test_xx.py * add examples and point robot maze environment * fix some bugs during testing examples * add dqn network and fix some args * change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally * add a warning to collector * rm some unrelated files * reformat * fix a bug in test_dqn due to the model wrong selection	2020-03-28 07:27:18 +08:00
Trinkle23897	44f911bc31	add pytorch drl result	2020-03-27 09:04:29 +08:00
Trinkle23897	519f9f20d0	update readme	2020-03-26 17:32:51 +08:00
Trinkle23897	c505cd8205	update readme	2020-03-26 11:42:34 +08:00
Trinkle23897	fdc969b830	fix collector	2020-03-25 14:08:28 +08:00
Trinkle23897	e95218e295	sac	2020-03-23 17:17:41 +08:00
Trinkle23897	30a0fc079c	td3	2020-03-23 11:34:52 +08:00
Trinkle23897	a87563b8e6	add demo of ppo continuous action task	2020-03-21 17:04:42 +08:00
Trinkle23897	c173f7bfbc	fix ddpg	2020-03-21 15:31:31 +08:00
Trinkle23897	8bd8246b16	refract test code	2020-03-21 10:58:01 +08:00

29 Commits