Tianshou

Author	SHA1	Message	Date
Maximilian Huettenrauch	9c645ff4a0	pleased the mypy gods	2024-03-27 15:37:19 +01:00
Maximilian Huettenrauch	9055eb5924	removed attributes from pandas logger	2024-03-27 13:55:13 +01:00
Maximilian Huettenrauch	6d9b697efe	restructured and moved RLiableExperimentResult	2024-03-27 12:03:31 +01:00
Maximilian Huettenrauch	18d8ffa576	removed name shortener	2024-03-27 12:02:43 +01:00
Maximilian Huettenrauch	e95fa26a14	replace assert with exception in wandb logger	2024-03-27 11:38:55 +01:00
Maximilian Huettenrauch	5259d5f3fb	Merge branch 'thuml_master' into feature/algo-eval # Conflicts: # examples/mujoco/mujoco_env.py	2024-03-15 09:42:17 +01:00
maxhuettenrauch	e82379c47f	Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072 ) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-03-14 11:07:56 +01:00
Maximilian Huettenrauch	a7898b15b8	small fix	2024-03-12 15:17:33 +01:00
Maximilian Huettenrauch	d9a612a997	format, type check and small fixes	2024-03-12 15:01:50 +01:00
Maximilian Huettenrauch	f730782f29	Merge branch 'thuml_master' into feature/algo-eval	2024-03-12 11:46:08 +01:00
Maximilian Huettenrauch	5762d2c2e0	extend hl experiment builder	2024-03-12 11:43:52 +01:00
Maximilian Huettenrauch	734119ec00	logger updates	2024-03-12 11:31:41 +01:00
Maximilian Huettenrauch	32cd3b4357	logger updates - introduced logger manager - loggers can reload logged data from disk	2024-03-11 10:29:17 +01:00
Dominik Jain	1714c7f2c7	High-level API: Fix number of test episodes being incorrectly scaled by number of envs (#1071 )	2024-03-07 08:57:11 -08:00
Maximilian Huettenrauch	95cbfe6cdf	added explicit env seeding for train and test envs	2024-03-06 17:09:06 +01:00
Erni	1aee41fa9c	Using dist.mode instead of logits.argmax (#1066 ) changed all the occurrences where an action is selected deterministically - from: using the outputs of the actor network. - to: using the mode of the PyTorch distribution. --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>	2024-03-03 00:09:39 +01:00
maxhuettenrauch	7c970df53f	Fix/add watch env with obs rms (#1061 ) Supports deciding whether to watch the agent performing on the env using high-level interfaces	2024-02-29 15:59:11 +01:00
Dominik Jain	49781e715e	Fix high-level examples (#1060 ) The high-level examples were all broken by changes made to make mypy pass. This PR fixes them, making a type change in logging.run_cli instead to make mypy happy.	2024-02-23 23:17:14 +01:00
Dominik Jain	08728ad35e	Resolve platform-specific/installation-specific mypy issues by adding ignores and ignoring unused ignores locally	2024-02-15 11:26:54 +01:00
Dominik Jain	eeb2081ca6	Fix AutoAlphaFactoryDefault using hard-coded Adam optimizer instead of passed factory	2024-02-14 20:43:38 +01:00
Dominik Jain	76cbd7efc2	Make OptimizerFactory more flexible by adding a second method which allows the creation of an optimizer given arbitrary parameters (rather than a module)	2024-02-14 20:42:06 +01:00
Dominik Jain	bf391853dc	Allow to configure number of test episodes in high-level API	2024-02-14 19:14:28 +01:00
Carlo Cagnetta	5fc314bd4b	Docs/use nbqa on notebooks (#1041 ) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974	2024-02-07 17:28:16 +01:00
Daniel Plop	eb0215cf76	Refactoring/mypy issues test (#1017 ) Improves typing in examples and tests, towards mypy passing there. Introduces the SpaceInfo utility	2024-02-06 14:24:30 +01:00
Michael Panchenko	6e1ffe58e5	Improvements in README and high-level API (#1022 ) This makes several largely unrelated improvements in the high-level API and in the README. Main improvements in high-level API: * Improve naming in trainer-related abstractions, moved some classes from examples to the library * Improve environment factory abstraction * Some bug-fixes Main changes in README: * Add high-level example and update procedural/low-level example * Improve language/wording	2024-01-16 15:24:41 +01:00
Dominik Jain	022cfb7f78	Cleaned up handling of output_dim retrieval, adding exceptions for erroneous cases	2024-01-16 14:52:31 +01:00
Dominik Jain	20074931d5	Improve docstrings	2024-01-16 14:52:31 +01:00
Dominik Jain	05a8cf4e74	Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered	2024-01-16 14:52:31 +01:00
Dominik Jain	c9cb41bf55	Make envpool usage configuration more explicit	2024-01-16 14:52:31 +01:00
Dominik Jain	1e5ebc2a2d	Improve naming of callback classes and related methods/attributes Add EpochStopCallbackRewardThreshold	2024-01-12 17:13:42 +01:00
Dominik Jain	24b7b82e56	Remove inappropriate warning (warns about supported case according to docstring)	2024-01-12 17:13:42 +01:00
Dominik Jain	ff398beed9	Move callbacks for setting DQN epsilon values to the library	2024-01-12 17:13:42 +01:00
Dominik Jain	eaab7b0a4b	Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently)	2024-01-12 17:13:42 +01:00
Dominik Jain	d4e4f4ff63	Experiment builders for DQN and IQN: * Fix: Disable softmax in default models * Add method with_model_factory_default (for DQN)	2024-01-10 15:42:18 +01:00
Michael Panchenko	789340f8d6	Minor simplification in train_step (#1019 )	2024-01-09 08:51:49 -08:00
Dominik Jain	f77d95da04	Fix: Missing type annotation of Experiment.watch_num_episodes	2024-01-08 18:00:37 +01:00
Dominik Jain	97a241a6fc	Fix: DiscreteEnvironments.from_factory used incorrect EnvType	2024-01-08 15:58:41 +01:00
maxhuettenrauch	522f7fbf98	Feature/dataclasses (#996 ) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2023-12-30 11:09:03 +01:00
Dominik Jain	e8cc80f990	Environments: Add option to a use a different factory for test envs to `from_factory` convenience construction mechanisms	2023-12-21 13:13:51 +01:00
Dominik Jain	45a1a3f259	SamplingConfig: Change default of repeat_per_collect to 1 (safest option)	2023-12-21 13:13:51 +01:00
Dominik Jain	408d51f9de	SamplingConfig: Improve/extend docstrings, clearly explaining the parameters	2023-12-21 13:13:51 +01:00
Dominik Jain	1903a72ecb	Improve logging	2023-12-14 19:31:30 +01:00
Dominik Jain	3caa3805f0	Fix: SamplingConfig.start_timesteps_random was not used	2023-12-14 11:47:32 +01:00
Michael Panchenko	0b67447541	Docs: fixing spelling, re-adding spellcheck to pipeline	2023-12-05 13:22:04 +01:00
Michael Panchenko	a846b52063	Typing: fixed multiple typing issues	2023-12-05 12:04:18 +01:00
Michael Panchenko	2e39a252e3	Docstring: minor changes to let ruff pass	2023-12-04 13:52:46 +01:00
Michael Panchenko	4cfefcf75d	Docs: removed conflicting sphinx stuff from a docstring	2023-12-04 11:48:09 +01:00
Michael Panchenko	a5685619ce	Docs: generate all api docs automatically Reinstate the -W option Several overall improvements in docs Fixed multiple links	2023-12-04 11:48:09 +01:00
Michael Panchenko	8d3d1f164b	Support batch_size=None and use it in various scripts (#993 ) Closes #986	2023-11-24 10:13:10 -08:00
Michael Panchenko	f134bc20b5	Bugfix/discrete bcq inf (#995 ) Fixes a small bug with using np.inf instead of torch-based infinity Closes #963 --------- Co-authored-by: ivan.rodriguez <ivan.rodriguez@unternehmertum.de>	2023-11-24 11:17:40 +01:00

1 2 3 4 5 ...

445 Commits