Tianshou

Author	SHA1	Message	Date
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
n+e	e4f4f0e144	fix docs build failure and a bug in a2c/ppo optimizer (#428 ) * fix rtfd build * list + list -> set.union * change seed of test_qrdqn * add py39 test	2021-08-30 02:07:03 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
Yuge Zhang	f4e05d585a	Support deterministic evaluation for onpolicy algorithms (#354 )	2021-04-27 21:22:39 +08:00
n+e	09692c84fe	fix numpy>=1.20 typing check (#323 ) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).	2021-03-30 16:06:03 +08:00
ChenDRAG	5d580c3662	refactor ppo (#329 )	2021-03-28 18:28:36 +08:00
ChenDRAG	1730a9008a	A2C benchmark for mujoco (#325 )	2021-03-28 13:12:43 +08:00
ChenDRAG	3ac67d9974	refactor A2C/PPO, change behavior of value normalization (#321 )	2021-03-25 10:12:39 +08:00
ChenDRAG	e27b5a26f3	Refactor PG algorithm and change behavior of `compute_episodic_return` (#319 ) - simplify code - apply value normalization (global) and adv norm (per-batch) in on-policy algorithms	2021-03-23 22:05:48 +08:00
ChenDRAG	2c11b6e43b	Add lr_scheduler option for Onpolicy algorithm (#318 ) add lr_scheduler option in PGPolicy/A2CPolicy/PPOPolicy	2021-03-22 16:57:24 +08:00
ChenDRAG	4d92952a7b	Remap action to fit gym's action space (#313 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-03-21 16:45:50 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
Jialu Zhu	a511cb4779	Add offline trainer and discrete BCQ algorithm (#263 ) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-20 18:13:04 +08:00
n+e	b284ace102	type check in unit test (#200 ) Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml. Also remove the out-of-the-date api	2020-09-13 19:31:50 +08:00
n+e	c91def6cbc	code format and update function signatures (#213 ) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))	2020-09-12 15:39:01 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	7f3b817b24	add policy.update to enable post process and remove collector.sample (#180 ) * add policy.update to enable post process and remove collector.sample * update doc in policy concept * remove collector.sample in doc * doc update of concepts * docs * polish * polish policy * remove collector.sample in docs * minor fix * Apply suggestions from code review just a test * doc fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-15 16:10:42 +08:00
n+e	38a95c19da	Yet another 3 fix (#160 ) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action	2020-07-24 17:38:12 +08:00
n+e	352a518399	3 fix (#158 ) - fix 2 warning in doctest - change the minimum version of gym (to be aligned with openai baselines) - change squeeze and reshape to flatten (related to #155). I think flatten is better.	2020-07-23 15:12:02 +08:00
n+e	089b85b6a2	Fix shape inconsistency in A2CPolicy and PPOPolicy (#155 ) - The original `r - v`'s shape in A2C is wrong. - The shape of log_prob is different: [bsz] in Categorical and [bsz, 1] in Normal. Should manually make the shape to be consistent with other tensors.	2020-07-21 22:24:06 +08:00
Trinkle23897	dc451dfe88	nstep all (fix #51 )	2020-06-03 13:59:47 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
Trinkle23897	04b091d975	fix max-grad-norm err in a2c (#46 )	2020-05-04 12:33:04 +08:00
Trinkle23897	134f787e24	reserve 'policy' keyword in replay buffer	2020-04-29 17:48:48 +08:00
Trinkle23897	80d661907e	Multimodal obs (#38 , #27 , #25 )	2020-04-28 20:56:02 +08:00
Trinkle23897	959955fa2a	fix historical issues	2020-04-26 16:13:51 +08:00
Trinkle23897	6bf1ea644d	fix ppo	2020-04-19 14:30:42 +08:00
Trinkle23897	680fc0ffbe	gae	2020-04-14 21:11:06 +08:00
Trinkle23897	3cc22b7c0c	__call__ -> forward	2020-04-10 10:47:16 +08:00
Trinkle23897	19f2cce294	seealso and change policy dir structure	2020-04-09 21:36:53 +08:00

35 Commits