Tianshou

Author	SHA1	Message	Date
danagi	16d8e9b051	SAC implementation update (#212 ) - replace DiagGuassian with Independent(Normal) (pytorch has already supported this) - detach alpha from autograd - add value/alpha to result (more informational) - revert #204 to fix #211 Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-09-12 08:44:50 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	7f3b817b24	add policy.update to enable post process and remove collector.sample (#180 ) * add policy.update to enable post process and remove collector.sample * update doc in policy concept * remove collector.sample in doc * doc update of concepts * docs * polish * polish policy * remove collector.sample in docs * minor fix * Apply suggestions from code review just a test * doc fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-15 16:10:42 +08:00
n+e	140b1c2cab	Improve PER (#159 ) - use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER	2020-08-06 10:26:24 +08:00
n+e	352a518399	3 fix (#158 ) - fix 2 warning in doctest - change the minimum version of gym (to be aligned with openai baselines) - change squeeze and reshape to flatten (related to #155). I think flatten is better.	2020-07-23 15:12:02 +08:00
n+e	089b85b6a2	Fix shape inconsistency in A2CPolicy and PPOPolicy (#155 ) - The original `r - v`'s shape in A2C is wrong. - The shape of log_prob is different: [bsz] in Categorical and [bsz, 1] in Normal. Should manually make the shape to be consistent with other tensors.	2020-07-21 22:24:06 +08:00
youkaichao	5b1373924e	doc fix; policy train/eval signiture fix (#109 ) * doc fix; policy train/eval signiture fix * change train/eval behavior according to pytorch * change train/eval behavior according to pytorch	2020-07-06 10:44:34 +08:00
danagi	c59ad40aef	Add auto alpha tuning and exploration noise for sac. (#80 ) Add class BaseNoise and GaussianNoise for the concept of exploration noise. Add new test for sac tested in MountainCarContinuous-v0, which should benefits from the two above new feature.	2020-06-16 22:17:28 +08:00
Trinkle23897	5f2f05a570	fix #40	2020-06-13 17:06:08 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
Imone	57bca16f94	Fix log_prob and PPO dual_clip (#49 ) * Added DiagGaussian to fix log_probg * Disable PPO dual_clip	2020-05-18 16:23:35 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
nicoguertler	8f718d9b13	Fix log_prob in SAC (#41 )	2020-04-28 23:44:15 +08:00
Trinkle23897	70290346ea	compatible with torch==1.5.0 (fix #37 )	2020-04-26 11:04:45 +08:00
Trinkle23897	3cc22b7c0c	__call__ -> forward	2020-04-10 10:47:16 +08:00
Trinkle23897	19f2cce294	seealso and change policy dir structure	2020-04-09 21:36:53 +08:00

20 Commits