Tianshou

Author	SHA1	Message	Date
danagi	c59ad40aef	Add auto alpha tuning and exploration noise for sac. (#80 ) Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature.	2020-06-16 22:17:28 +08:00
Trinkle23897	5f2f05a570	fix #40	2020-06-13 17:06:08 +08:00
Trinkle23897	397e92b0fc	fix #77	2020-06-10 12:06:56 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Trinkle23897	ff81a18f42	compute_nstep_returns (item 2 of #51 )	2020-06-02 22:29:50 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
magicly	6237cc0d52	fix dqn zero eps (#52 ) Co-authored-by: liyan <liyan1@digisky.com>	2020-05-21 11:35:41 +08:00
Imone	57bca16f94	Fix log_prob and PPO dual_clip (#49 ) * Added DiagGaussian to fix log_probg * Disable PPO dual_clip	2020-05-18 16:23:35 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
Trinkle23897	04b091d975	fix max-grad-norm err in a2c (#46 )	2020-05-04 12:33:04 +08:00
Trinkle23897	134f787e24	reserve 'policy' keyword in replay buffer	2020-04-29 17:48:48 +08:00
nicoguertler	8f718d9b13	Fix log_prob in SAC (#41 )	2020-04-28 23:44:15 +08:00
Trinkle23897	80d661907e	Multimodal obs (#38 , #27 , #25 )	2020-04-28 20:56:02 +08:00
Trinkle23897	959955fa2a	fix historical issues	2020-04-26 16:13:51 +08:00
Trinkle23897	6b96f124ae	fix pdqn	2020-04-26 15:11:20 +08:00
rocknamx	b23749463e	Prioritized DQN (#30 ) * add sum_tree.py * add prioritized replay buffer * del sum_tree.py * fix some format issues * fix weight_update bug * simply replace replaybuffer in test_dqn without weight update * weight default set to 1 * fix sampling bug when buffer is not full * rename parameter * fix formula error, add accuracy check * add PrioritizedDQN test * add test_pdqn.py * add update_weight() doc * add ref of prio dqn in readme.md and index.rst * restore test_dqn.py, fix args of test_pdqn.py	2020-04-26 12:05:58 +08:00
Trinkle23897	70290346ea	compatible with torch==1.5.0 (fix #37 )	2020-04-26 11:04:45 +08:00
Trinkle23897	6bf1ea644d	fix ppo	2020-04-19 14:30:42 +08:00
Trinkle23897	680fc0ffbe	gae	2020-04-14 21:11:06 +08:00
Trinkle23897	3cc22b7c0c	__call__ -> forward	2020-04-10 10:47:16 +08:00
Trinkle23897	13086b7f64	add ignore_obs_next in buffer	2020-04-10 09:01:17 +08:00
Trinkle23897	19f2cce294	seealso and change policy dir structure	2020-04-09 21:36:53 +08:00

24 Commits