Tianshou

Author	SHA1	Message	Date
Kenneth Schröder	cd7654bfd5	Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions (#521 )	2022-02-07 03:42:46 +08:00
ChenDRAG	c25926dd8f	Formalize variable names (#509 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-01-30 00:53:56 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
Yuge Zhang	f4e05d585a	Support deterministic evaluation for onpolicy algorithms (#354 )	2021-04-27 21:22:39 +08:00
ChenDRAG	3ac67d9974	refactor A2C/PPO, change behavior of value normalization (#321 )	2021-03-25 10:12:39 +08:00
ChenDRAG	e27b5a26f3	Refactor PG algorithm and change behavior of `compute_episodic_return` (#319 ) - simplify code - apply value normalization (global) and adv norm (per-batch) in on-policy algorithms	2021-03-23 22:05:48 +08:00
ChenDRAG	2c11b6e43b	Add lr_scheduler option for Onpolicy algorithm (#318 ) add lr_scheduler option in PGPolicy/A2CPolicy/PPOPolicy	2021-03-22 16:57:24 +08:00
ChenDRAG	4d92952a7b	Remap action to fit gym's action space (#313 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-03-21 16:45:50 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
n+e	b284ace102	type check in unit test (#200 ) Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml. Also remove the out-of-the-date api	2020-09-13 19:31:50 +08:00
n+e	c91def6cbc	code format and update function signatures (#213 ) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))	2020-09-12 15:39:01 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	7f3b817b24	add policy.update to enable post process and remove collector.sample (#180 ) * add policy.update to enable post process and remove collector.sample * update doc in policy concept * remove collector.sample in doc * doc update of concepts * docs * polish * polish policy * remove collector.sample in docs * minor fix * Apply suggestions from code review just a test * doc fix Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-15 16:10:42 +08:00
n+e	38a95c19da	Yet another 3 fix (#160 ) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action	2020-07-24 17:38:12 +08:00
n+e	352a518399	3 fix (#158 ) - fix 2 warning in doctest - change the minimum version of gym (to be aligned with openai baselines) - change squeeze and reshape to flatten (related to #155). I think flatten is better.	2020-07-23 15:12:02 +08:00
Trinkle23897	397e92b0fc	fix #77	2020-06-10 12:06:56 +08:00
Alexis DUBURCQ	8af7196a9a	Robust conversion from/to numpy/pytorch (#63 ) * Enable to convert Batch data back to torch. * Add torch converter to collector. * Fix * Move to_numpy/to_torch convert in dedicated utils.py. * Use to_numpy/to_torch to convert arrays. * fix lint * fix * Add unit test to check Batch from/to numpy. * Fix Batch over Batch. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>	2020-05-29 20:45:21 +08:00
Trinkle23897	de556fd22d	item3 of #51	2020-05-27 11:02:23 +08:00
Trinkle23897	0eef0ca198	fix optional type syntax	2020-05-16 20:08:32 +08:00
Trinkle23897	9b26137cd2	add type annotation	2020-05-12 11:31:47 +08:00
Trinkle23897	959955fa2a	fix historical issues	2020-04-26 16:13:51 +08:00
Trinkle23897	680fc0ffbe	gae	2020-04-14 21:11:06 +08:00
Trinkle23897	3cc22b7c0c	__call__ -> forward	2020-04-10 10:47:16 +08:00
Trinkle23897	13086b7f64	add ignore_obs_next in buffer	2020-04-10 09:01:17 +08:00
Trinkle23897	19f2cce294	seealso and change policy dir structure	2020-04-09 21:36:53 +08:00

29 Commits