Tianshou

Author	SHA1	Message	Date
Dominik Jain	35779696ee	Clean up handling of an Experiment's name (and, by extension, a run's name)	2024-05-05 22:27:19 +02:00
Dominik Jain	ca69e79b4a	Change the way in which deterministic evaluation is controlled: * Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer	2024-05-03 15:18:39 +02:00
Michael Panchenko	4b619c51ba	Collector: extracted interface BaseCollector, minor simplifications Renamed is_eval kwarg	2024-04-26 17:39:31 +02:00
Maximilian Huettenrauch	8cb17de190	update examples	2024-04-24 17:06:54 +02:00
maxhuettenrauch	ade85ab32b	Feature/algo eval (#1074 ) # Changes ## Dependencies - New extra "eval" ## Api Extension - `Experiment` and `ExperimentConfig` now have a `name`, that can however be overridden when `Experiment.run()` is called - When building an `Experiment` from an `ExperimentConfig`, the user has the option to add info about seeds to the name. - New method in `ExperimentConfig` called `build_default_seeded_experiments` - `SamplingConfig` has an explicit training seed, `test_seed` is inferred. - New `evaluation` package for repeating the same experiment with multiple seeds and aggregating the results (important extension!). Currently in alpha state. - Loggers can now restore the logged data into python by using the new `restore_logged_data` ## Breaking Changes - `AtariEnvFactory` (in examples) now receives explicit train and test seeds - `EnvFactoryRegistered` now requires an explicit `test_seed` - `BaseLogger.prepare_dict_for_logging` is now abstract --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-04-20 23:25:33 +00:00
maxhuettenrauch	60d1ba1c8f	Fix/reset before collect in procedural examples, tests and hl experiment (#1100 ) Needed due to a breaking change in the Collector which was overlooked in some of the examples	2024-04-16 10:30:21 +02:00
Daniel Plop	8a0629ded6	Fix mypy issues in tests and examples (#1077 ) Closes #952 - `SamplingConfig` supports `batch_size=None`. #1077 - tests and examples are covered by `mypy`. #1077 - `NetBase` is more used, stricter typing by making it generic. #1077 - `utils.net.common.Recurrent` now receives and returns a `RecurrentStateBatch` instead of a dict. #1077 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-03 18:07:51 +02:00
Erni	bf0d632108	Naming and typing improvements in Actor/Critic/Policy forwards (#1032 ) Closes #917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). #1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. #1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. #1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). #1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. #1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. #1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-01 17:14:17 +02:00
maxhuettenrauch	e82379c47f	Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072 ) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-03-14 11:07:56 +01:00
maxhuettenrauch	7c970df53f	Fix/add watch env with obs rms (#1061 ) Supports deciding whether to watch the agent performing on the env using high-level interfaces	2024-02-29 15:59:11 +01:00
Dominik Jain	49781e715e	Fix high-level examples (#1060 ) The high-level examples were all broken by changes made to make mypy pass. This PR fixes them, making a type change in logging.run_cli instead to make mypy happy.	2024-02-23 23:17:14 +01:00
Michael Panchenko	33d241a29b	Docs/html doc issues (#1048 ) Closes #1005 ## Main changes 2. Load vega-embed things using jupyter-book config 3. Add vega-embed dependencies as part of local code for offline development 4. Reduced duplication in benchmark.js 5. Update sphinx, docutils, and jupyter-book Co-authored-by: carlocagnetta <c.cagnetta@appliedai.de>	2024-02-09 19:43:10 +01:00
maxhuettenrauch	5fe9aea798	Update and fix dependencies related to mac install (#1044 ) Addresses part of #1015 ### Dependencies - move jsonargparse and docstring-parser to dependencies to run hl examples without dev - create mujoco-py extra for legacy mujoco envs - updated atari extra - removed atari-py and gym dependencies - added ALE-py, autorom, and shimmy - created robotics extra for HER-DDPG ### Mac specific - only install envpool when not on mac - mujoco-py not working on macOS newer than Monterey (https://github.com/openai/mujoco-py/issues/777) - D4RL also fails due to dependency on mujoco-py (https://github.com/Farama-Foundation/D4RL/issues/232) ### Other - reduced training-num/test-num in example files to a number ≤ 20 (examples with 100 led to too many open files) - rendering for Mujoco envs needs to be fixed on gymnasium side (https://github.com/Farama-Foundation/Gymnasium/issues/749) --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-02-06 17:06:38 +01:00
Daniel Plop	eb0215cf76	Refactoring/mypy issues test (#1017 ) Improves typing in examples and tests, towards mypy passing there. Introduces the SpaceInfo utility	2024-02-06 14:24:30 +01:00
Dominik Jain	05a8cf4e74	Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered	2024-01-16 14:52:31 +01:00
Dominik Jain	c9cb41bf55	Make envpool usage configuration more explicit	2024-01-16 14:52:31 +01:00
Dominik Jain	7fa588309b	Update MuJoCo examples to use Ant-v4 instead of Ant-v3	2024-01-12 17:13:42 +01:00
Dominik Jain	eaab7b0a4b	Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently)	2024-01-12 17:13:42 +01:00
maxhuettenrauch	522f7fbf98	Feature/dataclasses (#996 ) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2023-12-30 11:09:03 +01:00
Michael Panchenko	8d3d1f164b	Support batch_size=None and use it in various scripts (#993 ) Closes #986	2023-11-24 10:13:10 -08:00
Dominik Jain	dae4000cd2	Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.	2023-11-08 19:11:39 +01:00
Dominik Jain	fdb0eba93d	Depend on sensAI instead of copying its utils (logging, string)	2023-10-27 20:15:58 +02:00
Dominik Jain	c613557740	Apply datetime_tag() in high-level examples	2023-10-26 12:50:08 +02:00
Dominik Jain	dd4a0eb430	Fix: Add MujocoEnvObsRmsPersistence only if obs_norm is enabled	2023-10-24 13:52:30 +02:00
Dominik Jain	b5a891557f	Revert to simplified environment factory, removing unnecessary config object (configuration shall be part of the factory instance)	2023-10-24 13:14:23 +02:00
Dominik Jain	7437131d79	Fix tianshou.highlevel depending on jsonargparse (should be dev dependency only) by introducing a new place where jsonargparse can be configured: logging.run_cli, which is also slightly more convenient	2023-10-19 11:40:49 +02:00
Dominik Jain	6cbee188b8	Change interface of EnvFactory to ensure that configuration of number of environments in SamplingConfig is used (values are now passed to factory method) This is clearer and removes the need to pass otherwise unnecessary configuration to environment factories at construction	2023-10-19 11:37:20 +02:00
Dominik Jain	41bd463a7b	Allow to configure activation function in default networks * Set ReLU as default in all actor and critic factories * Configure non-default in applicable MuJoCo examples	2023-10-18 20:44:18 +02:00
Dominik Jain	ed06ab7ff0	Handle obs_norm setting in MuJoCo envs	2023-10-18 20:44:18 +02:00
Dominik Jain	76e870207d	Improve persistence handling * Add persistence/restoration of Experiment instance * Add file logging in experiment * Allow all persistence/logging to be disabled * Disable persistence in tests	2023-10-18 20:44:18 +02:00
Dominik Jain	3691ed2abc	Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances	2023-10-18 20:44:17 +02:00
Dominik Jain	17ef4dd5eb	Support REDQ in high-level API * Implement example mujoco_redq_hl * Add abstraction CriticEnsembleFactory with default implementations to suit REDQ * Fix type annotation of linear_layer in Net, MLP, Critic (was incompatible with REDQ usage)	2023-10-18 20:44:17 +02:00
Dominik Jain	7af836bd6a	Support TRPO in high-level API and add example mujoco_trpo_hl	2023-10-18 20:44:17 +02:00
Dominik Jain	383a4a6083	Support NPG in high-level API and add example mujoco_npg_hl	2023-10-18 20:44:17 +02:00
Dominik Jain	6bb3abb2f0	Support PG/Reinforce in high-level API * Add example mujoco_reinforce_hl * Extended functionality of ActorFactory to support creation of ModuleOpt	2023-10-18 20:44:17 +02:00
Dominik Jain	4e93c12afa	Remove obsolete configuration files	2023-10-18 20:44:17 +02:00
Dominik Jain	a161a9cf58	Improve type annotations, fix type issues and add checks	2023-10-18 20:44:17 +02:00
Dominik Jain	1243894eb8	Add DistributionFunctionFactory subclasses for discrete/continuous default	2023-10-18 20:44:17 +02:00
Dominik Jain	837ff13c04	Reorder ExperimentBuilder args (EnvFactory first)	2023-10-18 20:44:17 +02:00
Dominik Jain	d269063e6a	Remove 'RL' prefix from class names	2023-10-18 20:44:17 +02:00
Dominik Jain	9f0a410bb1	Log full experiment configuration, adding string representations to relevant classes	2023-10-18 20:44:16 +02:00
Dominik Jain	2671580c6c	Add DDPG high-level API and MuJoCo example	2023-10-18 20:44:16 +02:00
Dominik Jain	cd79cf8661	Add A2C high-level API * Add common based class for A2C and PPO agent factories * Add default for dist_fn parameter, adding corresponding factories * Add example mujoco_a2c_hl	2023-10-18 20:44:16 +02:00
Dominik Jain	78b6dd1f49	Adapt class naming scheme * Use prefix convention (subclasses have superclass names as prefix) to facilitate discoverability of relevant classes via IDE autocompletion * Use dual naming, adding an alternative concise name that omits the precise OO semantics and retains only the essential part of the name (which can be more pleasing to users not accustomed to convoluted OO naming)	2023-10-18 20:44:16 +02:00
Michael Panchenko	5bcf514c55	Add alternative functional interface for environment creation where a persistable configuration object is passed as an argument, as this can help to ensure persistability (making the requirement explicit)	2023-10-18 20:44:16 +02:00
Dominik Jain	e993425aa1	Add high-level API support for TD3 * Created mixins for agent factories to reduce code duplication * Further factorised params & mixins for experiment factories * Additional parameter abstractions * Implement high-level MuJoCo TD3 example	2023-10-18 20:44:16 +02:00
Dominik Jain	367778d37f	Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way	2023-10-18 20:44:16 +02:00
Dominik Jain	37dc07e487	Add high-level experiment builder interface	2023-10-18 20:44:05 +02:00
Dominik Jain	3fd60f9e70	Unify PPO configuration objects, use experiment-specific configuration in mujoco_ppo_hl	2023-10-09 13:02:29 +02:00
Dominik Jain	8ec42009cb	Move RLSamplingConfig to separate module config, fixing cyclic import	2023-10-09 13:02:23 +02:00

1 2

100 Commits