Tianshou

Author	SHA1	Message	Date
Michael Panchenko	bf3859a457	Extension of ExpLauncher and DataclassPPrintMixin 1. Launch in main process if only 1 exp is passed 2. Launcher returns a list of stats for successful exps 3. More detailed logging for unsuccessful expos 4. Raise error if all runs were unsuccessful 5. DataclassPPrintMixin allows retrieving a pretty repr string 6. Minor improvements in docstrings	2024-05-07 16:21:50 +02:00
Michael Panchenko	1cd22f1d32	Added and used new VenvType: SUBPROC_SHARED_MEM_AUTO	2024-05-07 14:13:20 +02:00
Michael Panchenko	e94a5c04cf	New context manager: policy_within_training_step Adjusted notebooks, log messages and docs accordingly. Removed now obsolete in_eval_mode and the private context manager in Trainer	2024-05-06 19:22:58 +02:00
Michael Panchenko	6a5b3c837a	Docstrings, skip hidden files in autogen_rst	2024-05-05 23:31:20 +02:00
Michael Panchenko	2abb4dac24	Reinstated warning module	2024-05-05 22:27:19 +02:00
Dominik Jain	024b80e79c	Improve creation of multiple seeded experiments: * Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)	2024-05-05 22:27:19 +02:00
Dominik Jain	35779696ee	Clean up handling of an Experiment's name (and, by extension, a run's name)	2024-05-05 22:27:19 +02:00
Michael Panchenko	a8e9df31f7	Bugfix: allow for training_stat to be None instead of asserting not-None	2024-05-05 22:27:19 +02:00
Michael Panchenko	4e38aeb829	Merge branch 'refs/heads/thuml-master' into policy-train-eval # Conflicts: # CHANGELOG.md	2024-05-05 16:03:34 +02:00
Michael Panchenko	82f425e9fe	Collector: move @override, removed docstrings from overridden methods	2024-05-05 16:01:52 +02:00
Michael Panchenko	26a6cca76e	Improved docstrings, added asserts to make mypy happy	2024-05-05 15:56:06 +02:00
Michael Panchenko	c5d0e169b5	Collector: removed unnecessary no-grad flag from interfaces. Breaking	2024-05-05 15:41:20 +02:00
Michael Panchenko	f876198870	Formatting	2024-05-05 15:16:53 +02:00
Michael Panchenko	6927eadaa7	BatchPolicy: check that `self.is_within_training_step` is True on update	2024-05-05 15:14:59 +02:00
Dominik Jain	c35be8d07e	Establish backward compatibility by implementing __setstate__	2024-05-03 15:18:39 +02:00
Dominik Jain	ca69e79b4a	Change the way in which deterministic evaluation is controlled: * Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer	2024-05-03 15:18:39 +02:00
Dominik Jain	ca4dad1139	BaseTrainer: Refactoring New method training_step, which * collects training data (method _collect_training_data) * performs "test in train" (method _test_in_train) * performs policy update The old method named train_step performed only the first two points and was now split into two separate methods	2024-05-03 10:12:35 +02:00
Dominik Jain	4f16494609	Set torch train mode in BasePolicy.update instead of in each .learn implementation, as this is less prone to errors	2024-05-02 11:51:08 +02:00
bordeauxred	f31a91df5d	Typo docstring (#1132 )	2024-05-01 08:59:00 +02:00
bordeauxred	61426acf07	Improve the documentation of compute_episodic_return in base policy. (#1130 )	2024-04-30 14:40:16 +02:00
Dominik Jain	8ac6bf5fbb	Improve docstrings	2024-04-29 18:27:02 +02:00
Dominik Jain	250a129cc4	SamplingConfig: Improve docstrings of replay_buffer_save_only_last_obs, replay_buffer_stack_num	2024-04-29 18:27:02 +02:00
Dominik Jain	d18ded333e	CriticFactoryReuseActor: Fix the case where we want to reuse an actor's preprocessing network for the critic (must be applied before concatenating the actions)	2024-04-29 18:27:02 +02:00
Dominik Jain	0b494845c9	continuous.Critic: Add flag apply_preprocess_net_to_obs_only to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network	2024-04-29 18:27:02 +02:00
Dominik Jain	18ed981875	Add pickle/serialisation utils: setstate and getstate	2024-04-29 18:27:02 +02:00
Michael Panchenko	45922712d9	Dosctring add return [skip-ci]	2024-04-26 18:14:20 +02:00
Michael Panchenko	6aa33b1bfe	Formatting	2024-04-26 17:54:14 +02:00
Michael Panchenko	2eaf1f37c2	Use the new BaseCollector interface for annotations	2024-04-26 17:53:27 +02:00
Michael Panchenko	69f07a8f12	Tests: fixed typing issues by declaring union types and no longer reusing var names	2024-04-26 17:39:31 +02:00
Michael Panchenko	4b619c51ba	Collector: extracted interface BaseCollector, minor simplifications Renamed is_eval kwarg	2024-04-26 17:39:31 +02:00
Michael Panchenko	7d59302095	Added in_eval/in_train mode contextmanager	2024-04-26 17:39:30 +02:00
Michael Panchenko	829fd9c7a5	Deleted long deprecated functionality, removed unused warning module There's better ways to deal with deprecations that we shall use in the future	2024-04-26 14:42:44 +02:00
Maximilian Huettenrauch	e499bed8b0	add is_eval attribute to policy and set this attribute as well as train mode in appropriate places	2024-04-24 17:06:42 +02:00
maxhuettenrauch	ade85ab32b	Feature/algo eval (#1074 ) # Changes ## Dependencies - New extra "eval" ## Api Extension - `Experiment` and `ExperimentConfig` now have a `name`, that can however be overridden when `Experiment.run()` is called - When building an `Experiment` from an `ExperimentConfig`, the user has the option to add info about seeds to the name. - New method in `ExperimentConfig` called `build_default_seeded_experiments` - `SamplingConfig` has an explicit training seed, `test_seed` is inferred. - New `evaluation` package for repeating the same experiment with multiple seeds and aggregating the results (important extension!). Currently in alpha state. - Loggers can now restore the logged data into python by using the new `restore_logged_data` ## Breaking Changes - `AtariEnvFactory` (in examples) now receives explicit train and test seeds - `EnvFactoryRegistered` now requires an explicit `test_seed` - `BaseLogger.prepare_dict_for_logging` is now abstract --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-04-20 23:25:33 +00:00
maxhuettenrauch	9c0b3e7292	use explicit multiprocessing context for creating Pipe in subproc.py (#1102 )	2024-04-19 11:08:53 +02:00
maxhuettenrauch	a043711c10	Fix/deterministic action space sampling in SubprocVectorEnv (#1103 )	2024-04-18 16:16:57 +02:00
Daniel Plop	6935a111d9	Add non in-place version of `Batch.to_torch` (#1117 ) Closes: https://github.com/aai-institute/tianshou/issues/1116 ### API Extensions - Batch received new method: `to_torch_`. #1117 ### Breaking Changes - The method `to_torch` in `data.utils.batch.Batch` is not in-place anymore. Instead, a new method `to_torch_` does the conversion in-place. #1117	2024-04-17 22:07:24 +02:00
Daniel Plop	ca4f74f40e	Allow two (same/different) Batch objs to be tested for equality (#1098 ) Closes: https://github.com/thu-ml/tianshou/issues/1086 ### Api Extensions - Batch received new method: `to_numpy_`. #1098 - `to_dict` in Batch supports also non-recursive conversion. #1098 - Batch `__eq__` now implemented, semantic equality check of batches is now possible. #1098 ### Breaking Changes - The method `to_numpy` in `data.utils.batch.Batch` is not in-place anymore. Instead, a new method `to_numpy_` does the conversion in-place. #1098	2024-04-16 18:12:48 +02:00
maxhuettenrauch	60d1ba1c8f	Fix/reset before collect in procedural examples, tests and hl experiment (#1100 ) Needed due to a breaking change in the Collector which was overlooked in some of the examples	2024-04-16 10:30:21 +02:00
Erni	e2a2a6856d	Changed .keys() to get_keys() in batch class (#1105 ) Solves the inconsistency that iter(Batch) is not the same as Batch.keys() by "deprecating" the implicit .keys() method Closes: #922	2024-04-12 12:15:37 +02:00
Daniel Plop	8a0629ded6	Fix mypy issues in tests and examples (#1077 ) Closes #952 - `SamplingConfig` supports `batch_size=None`. #1077 - tests and examples are covered by `mypy`. #1077 - `NetBase` is more used, stricter typing by making it generic. #1077 - `utils.net.common.Recurrent` now receives and returns a `RecurrentStateBatch` instead of a dict. #1077 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-03 18:07:51 +02:00
Michael Panchenko	55fa6f7f35	Don't raise error on len of empty Batch (#1084 )	2024-04-03 13:37:18 +02:00
Erni	bf0d632108	Naming and typing improvements in Actor/Critic/Policy forwards (#1032 ) Closes #917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). #1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. #1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. #1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). #1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. #1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. #1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-01 17:14:17 +02:00
bordeauxred	4f65b131aa	Feat/refactor collector (#1063 ) Closes: #1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063 - `Collector`s can now be closed, and their reset is more granular. #1063 - Trainers can control whether collectors should be reset prior to training. #1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063 - Improved typing for `exploration_noise` and within Collector. #1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. #1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. #1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-03-28 18:02:31 +01:00
maxhuettenrauch	e82379c47f	Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072 ) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-03-14 11:07:56 +01:00
Dominik Jain	1714c7f2c7	High-level API: Fix number of test episodes being incorrectly scaled by number of envs (#1071 )	2024-03-07 08:57:11 -08:00
Erni	1aee41fa9c	Using dist.mode instead of logits.argmax (#1066 ) changed all the occurrences where an action is selected deterministically - from: using the outputs of the actor network. - to: using the mode of the PyTorch distribution. --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>	2024-03-03 00:09:39 +01:00
maxhuettenrauch	7c970df53f	Fix/add watch env with obs rms (#1061 ) Supports deciding whether to watch the agent performing on the env using high-level interfaces	2024-02-29 15:59:11 +01:00
Dominik Jain	49781e715e	Fix high-level examples (#1060 ) The high-level examples were all broken by changes made to make mypy pass. This PR fixes them, making a type change in logging.run_cli instead to make mypy happy.	2024-02-23 23:17:14 +01:00
Dominik Jain	08728ad35e	Resolve platform-specific/installation-specific mypy issues by adding ignores and ignoring unused ignores locally	2024-02-15 11:26:54 +01:00

1 2 3 4 5 ...

476 Commits