Tianshou

Author	SHA1	Message	Date
Michael Panchenko	bf3859a457	Extension of ExpLauncher and DataclassPPrintMixin 1. Launch in main process if only 1 exp is passed 2. Launcher returns a list of stats for successful exps 3. More detailed logging for unsuccessful expos 4. Raise error if all runs were unsuccessful 5. DataclassPPrintMixin allows retrieving a pretty repr string 6. Minor improvements in docstrings	2024-05-07 16:21:50 +02:00
Michael Panchenko	1cd22f1d32	Added and used new VenvType: SUBPROC_SHARED_MEM_AUTO	2024-05-07 14:13:20 +02:00
Dominik Jain	024b80e79c	Improve creation of multiple seeded experiments: * Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)	2024-05-05 22:27:19 +02:00
Dominik Jain	35779696ee	Clean up handling of an Experiment's name (and, by extension, a run's name)	2024-05-05 22:27:19 +02:00
Michael Panchenko	4e38aeb829	Merge branch 'refs/heads/thuml-master' into policy-train-eval # Conflicts: # CHANGELOG.md	2024-05-05 16:03:34 +02:00
Michael Panchenko	f876198870	Formatting	2024-05-05 15:16:53 +02:00
Dominik Jain	ca69e79b4a	Change the way in which deterministic evaluation is controlled: * Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer	2024-05-03 15:18:39 +02:00
Dominik Jain	250a129cc4	SamplingConfig: Improve docstrings of replay_buffer_save_only_last_obs, replay_buffer_stack_num	2024-04-29 18:27:02 +02:00
Dominik Jain	d18ded333e	CriticFactoryReuseActor: Fix the case where we want to reuse an actor's preprocessing network for the critic (must be applied before concatenating the actions)	2024-04-29 18:27:02 +02:00
Michael Panchenko	2eaf1f37c2	Use the new BaseCollector interface for annotations	2024-04-26 17:53:27 +02:00
Michael Panchenko	4b619c51ba	Collector: extracted interface BaseCollector, minor simplifications Renamed is_eval kwarg	2024-04-26 17:39:31 +02:00
Maximilian Huettenrauch	e499bed8b0	add is_eval attribute to policy and set this attribute as well as train mode in appropriate places	2024-04-24 17:06:42 +02:00
maxhuettenrauch	ade85ab32b	Feature/algo eval (#1074 ) # Changes ## Dependencies - New extra "eval" ## Api Extension - `Experiment` and `ExperimentConfig` now have a `name`, that can however be overridden when `Experiment.run()` is called - When building an `Experiment` from an `ExperimentConfig`, the user has the option to add info about seeds to the name. - New method in `ExperimentConfig` called `build_default_seeded_experiments` - `SamplingConfig` has an explicit training seed, `test_seed` is inferred. - New `evaluation` package for repeating the same experiment with multiple seeds and aggregating the results (important extension!). Currently in alpha state. - Loggers can now restore the logged data into python by using the new `restore_logged_data` ## Breaking Changes - `AtariEnvFactory` (in examples) now receives explicit train and test seeds - `EnvFactoryRegistered` now requires an explicit `test_seed` - `BaseLogger.prepare_dict_for_logging` is now abstract --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-04-20 23:25:33 +00:00
maxhuettenrauch	60d1ba1c8f	Fix/reset before collect in procedural examples, tests and hl experiment (#1100 ) Needed due to a breaking change in the Collector which was overlooked in some of the examples	2024-04-16 10:30:21 +02:00
Daniel Plop	8a0629ded6	Fix mypy issues in tests and examples (#1077 ) Closes #952 - `SamplingConfig` supports `batch_size=None`. #1077 - tests and examples are covered by `mypy`. #1077 - `NetBase` is more used, stricter typing by making it generic. #1077 - `utils.net.common.Recurrent` now receives and returns a `RecurrentStateBatch` instead of a dict. #1077 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-03 18:07:51 +02:00
Erni	bf0d632108	Naming and typing improvements in Actor/Critic/Policy forwards (#1032 ) Closes #917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). #1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. #1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. #1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). #1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. #1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. #1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-01 17:14:17 +02:00
bordeauxred	4f65b131aa	Feat/refactor collector (#1063 ) Closes: #1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063 - `Collector`s can now be closed, and their reset is more granular. #1063 - Trainers can control whether collectors should be reset prior to training. #1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063 - Improved typing for `exploration_noise` and within Collector. #1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. #1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. #1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-03-28 18:02:31 +01:00
maxhuettenrauch	e82379c47f	Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072 ) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-03-14 11:07:56 +01:00
Dominik Jain	1714c7f2c7	High-level API: Fix number of test episodes being incorrectly scaled by number of envs (#1071 )	2024-03-07 08:57:11 -08:00
maxhuettenrauch	7c970df53f	Fix/add watch env with obs rms (#1061 ) Supports deciding whether to watch the agent performing on the env using high-level interfaces	2024-02-29 15:59:11 +01:00
Dominik Jain	eeb2081ca6	Fix AutoAlphaFactoryDefault using hard-coded Adam optimizer instead of passed factory	2024-02-14 20:43:38 +01:00
Dominik Jain	76cbd7efc2	Make OptimizerFactory more flexible by adding a second method which allows the creation of an optimizer given arbitrary parameters (rather than a module)	2024-02-14 20:42:06 +01:00
Dominik Jain	bf391853dc	Allow to configure number of test episodes in high-level API	2024-02-14 19:14:28 +01:00
Carlo Cagnetta	5fc314bd4b	Docs/use nbqa on notebooks (#1041 ) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974	2024-02-07 17:28:16 +01:00
Dominik Jain	20074931d5	Improve docstrings	2024-01-16 14:52:31 +01:00
Dominik Jain	05a8cf4e74	Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered	2024-01-16 14:52:31 +01:00
Dominik Jain	c9cb41bf55	Make envpool usage configuration more explicit	2024-01-16 14:52:31 +01:00
Dominik Jain	1e5ebc2a2d	Improve naming of callback classes and related methods/attributes Add EpochStopCallbackRewardThreshold	2024-01-12 17:13:42 +01:00
Dominik Jain	ff398beed9	Move callbacks for setting DQN epsilon values to the library	2024-01-12 17:13:42 +01:00
Dominik Jain	eaab7b0a4b	Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently)	2024-01-12 17:13:42 +01:00
Dominik Jain	d4e4f4ff63	Experiment builders for DQN and IQN: * Fix: Disable softmax in default models * Add method with_model_factory_default (for DQN)	2024-01-10 15:42:18 +01:00
Dominik Jain	f77d95da04	Fix: Missing type annotation of Experiment.watch_num_episodes	2024-01-08 18:00:37 +01:00
Dominik Jain	97a241a6fc	Fix: DiscreteEnvironments.from_factory used incorrect EnvType	2024-01-08 15:58:41 +01:00
maxhuettenrauch	522f7fbf98	Feature/dataclasses (#996 ) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2023-12-30 11:09:03 +01:00
Dominik Jain	e8cc80f990	Environments: Add option to a use a different factory for test envs to `from_factory` convenience construction mechanisms	2023-12-21 13:13:51 +01:00
Dominik Jain	45a1a3f259	SamplingConfig: Change default of repeat_per_collect to 1 (safest option)	2023-12-21 13:13:51 +01:00
Dominik Jain	408d51f9de	SamplingConfig: Improve/extend docstrings, clearly explaining the parameters	2023-12-21 13:13:51 +01:00
Dominik Jain	1903a72ecb	Improve logging	2023-12-14 19:31:30 +01:00
Dominik Jain	3caa3805f0	Fix: SamplingConfig.start_timesteps_random was not used	2023-12-14 11:47:32 +01:00
Michael Panchenko	a846b52063	Typing: fixed multiple typing issues	2023-12-05 12:04:18 +01:00
Michael Panchenko	2e39a252e3	Docstring: minor changes to let ruff pass	2023-12-04 13:52:46 +01:00
Dominik Jain	6d6c85e594	Fix an issue where policies built with LRSchedulerFactoryLinear were not picklable (#992 ) - [X] I have marked all applicable categories: + [X] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [X] I have reformatted the code using `make format` (required) - [X] I have checked the code using `make commit-checks` (required) - [ ] If applicable, I have mentioned the relevant/related issue(s) - [ ] If applicable, I have listed every items in this Pull Request below The cause was the use of a lambda function in the state of a generated object.	2023-11-14 10:23:18 -08:00
Dominik Jain	dae4000cd2	Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.	2023-11-08 19:11:39 +01:00
Dominik Jain	ac672f65d1	Add docstring for ActorFactoryTransientStorageDecorator	2023-11-06 17:18:10 +01:00
Dominik Jain	7e6d3d627e	Rename class ActorCriticModuleOpt -> ActorCriticOpt	2023-11-06 16:51:41 +01:00
Dominik Jain	5c8d57a2d2	Fix index error in call to _with_critic_factory_default Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2023-11-06 16:17:14 +01:00
Dominik Jain	fdb0eba93d	Depend on sensAI instead of copying its utils (logging, string)	2023-10-27 20:15:58 +02:00
Dominik Jain	5952993cfe	Add option to disable file logging	2023-10-27 18:59:43 +02:00
Dominik Jain	a3dbe90515	Allow to configure the policy persistence mode, adding a new mode which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object	2023-10-26 13:19:33 +02:00
Dominik Jain	d684dae6cd	Change default number of environments (train=#CPUs, test=1)	2023-10-26 12:50:08 +02:00

1 2 3

112 Commits