Tianshou

Author	SHA1	Message	Date
Dominik Jain	18f236167f	Fix invalid kwarg	2024-05-03 10:12:41 +02:00
Dominik Jain	ca4dad1139	BaseTrainer: Refactoring New method training_step, which * collects training data (method _collect_training_data) * performs "test in train" (method _test_in_train) * performs policy update The old method named train_step performed only the first two points and was now split into two separate methods	2024-05-03 10:12:35 +02:00
Dominik Jain	4f16494609	Set torch train mode in BasePolicy.update instead of in each .learn implementation, as this is less prone to errors	2024-05-02 11:51:08 +02:00
Michael Panchenko	a2b9d7c7d8	Changelog [skip-ci]	2024-04-26 18:31:02 +02:00
Michael Panchenko	45922712d9	Dosctring add return [skip-ci]	2024-04-26 18:14:20 +02:00
Michael Panchenko	e2e8a699ea	Changelog [skip-ci]	2024-04-26 18:11:23 +02:00
Michael Panchenko	6aa33b1bfe	Formatting	2024-04-26 17:54:14 +02:00
Michael Panchenko	c28508b3be	Changelog	2024-04-26 17:53:34 +02:00
Michael Panchenko	2eaf1f37c2	Use the new BaseCollector interface for annotations	2024-04-26 17:53:27 +02:00
Michael Panchenko	07a97c7d93	Merge branch 'refs/heads/thuml-master' into policy-train-eval	2024-04-26 17:44:57 +02:00
Michael Panchenko	69f07a8f12	Tests: fixed typing issues by declaring union types and no longer reusing var names	2024-04-26 17:39:31 +02:00
Michael Panchenko	4b619c51ba	Collector: extracted interface BaseCollector, minor simplifications Renamed is_eval kwarg	2024-04-26 17:39:31 +02:00
Michael Panchenko	12d4262f80	Tests: removed all instances of `if __name__ == ...` in tests A test is not a script and should not be used as such Also marked pistonball test as skipped since it doesn't actually test anything	2024-04-26 17:39:30 +02:00
Michael Panchenko	7d59302095	Added in_eval/in_train mode contextmanager	2024-04-26 17:39:30 +02:00
Michael Panchenko	829fd9c7a5	Deleted long deprecated functionality, removed unused warning module There's better ways to deal with deprecations that we shall use in the future	2024-04-26 14:42:44 +02:00
Michael Panchenko	081adedc32	Changelog + dependabot bumps (#1124 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-25 08:49:54 -07:00
Maximilian Huettenrauch	49c750fb09	update tests	2024-04-24 17:06:59 +02:00
Maximilian Huettenrauch	8cb17de190	update examples	2024-04-24 17:06:54 +02:00
Maximilian Huettenrauch	e499bed8b0	add is_eval attribute to policy and set this attribute as well as train mode in appropriate places	2024-04-24 17:06:42 +02:00
maxhuettenrauch	ade85ab32b	Feature/algo eval (#1074 ) # Changes ## Dependencies - New extra "eval" ## Api Extension - `Experiment` and `ExperimentConfig` now have a `name`, that can however be overridden when `Experiment.run()` is called - When building an `Experiment` from an `ExperimentConfig`, the user has the option to add info about seeds to the name. - New method in `ExperimentConfig` called `build_default_seeded_experiments` - `SamplingConfig` has an explicit training seed, `test_seed` is inferred. - New `evaluation` package for repeating the same experiment with multiple seeds and aggregating the results (important extension!). Currently in alpha state. - Loggers can now restore the logged data into python by using the new `restore_logged_data` ## Breaking Changes - `AtariEnvFactory` (in examples) now receives explicit train and test seeds - `EnvFactoryRegistered` now requires an explicit `test_seed` - `BaseLogger.prepare_dict_for_logging` is now abstract --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-04-20 23:25:33 +00:00
maxhuettenrauch	9c0b3e7292	use explicit multiprocessing context for creating Pipe in subproc.py (#1102 )	2024-04-19 11:08:53 +02:00
maxhuettenrauch	a043711c10	Fix/deterministic action space sampling in SubprocVectorEnv (#1103 )	2024-04-18 16:16:57 +02:00
Daniel Plop	6935a111d9	Add non in-place version of `Batch.to_torch` (#1117 ) Closes: https://github.com/aai-institute/tianshou/issues/1116 ### API Extensions - Batch received new method: `to_torch_`. #1117 ### Breaking Changes - The method `to_torch` in `data.utils.batch.Batch` is not in-place anymore. Instead, a new method `to_torch_` does the conversion in-place. #1117	2024-04-17 22:07:24 +02:00
Daniel Plop	ca4f74f40e	Allow two (same/different) Batch objs to be tested for equality (#1098 ) Closes: https://github.com/thu-ml/tianshou/issues/1086 ### Api Extensions - Batch received new method: `to_numpy_`. #1098 - `to_dict` in Batch supports also non-recursive conversion. #1098 - Batch `__eq__` now implemented, semantic equality check of batches is now possible. #1098 ### Breaking Changes - The method `to_numpy` in `data.utils.batch.Batch` is not in-place anymore. Instead, a new method `to_numpy_` does the conversion in-place. #1098	2024-04-16 18:12:48 +02:00
Michael Panchenko	049907d9ab	Fix type check in atari wrapper, solves #1111	2024-04-16 10:52:48 +02:00
maxhuettenrauch	60d1ba1c8f	Fix/reset before collect in procedural examples, tests and hl experiment (#1100 ) Needed due to a breaking change in the Collector which was overlooked in some of the examples	2024-04-16 10:30:21 +02:00
Molasses	766f6fedf2	Fix imports in Readme	2024-04-15 11:32:35 +02:00
Erni	e2a2a6856d	Changed .keys() to get_keys() in batch class (#1105 ) Solves the inconsistency that iter(Batch) is not the same as Batch.keys() by "deprecating" the implicit .keys() method Closes: #922	2024-04-12 12:15:37 +02:00
Michael Panchenko	03e9af04b7	Update README.md (removed instability warning) [skip ci]	2024-04-05 12:05:20 +02:00
Michael Panchenko	bab5c634e7	Missing link in README.md [skip ci]	2024-04-05 12:04:27 +02:00
Daniel Plop	8a0629ded6	Fix mypy issues in tests and examples (#1077 ) Closes #952 - `SamplingConfig` supports `batch_size=None`. #1077 - tests and examples are covered by `mypy`. #1077 - `NetBase` is more used, stricter typing by making it generic. #1077 - `utils.net.common.Recurrent` now receives and returns a `RecurrentStateBatch` instead of a dict. #1077 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-03 18:07:51 +02:00
Michael Panchenko	55fa6f7f35	Don't raise error on len of empty Batch (#1084 )	2024-04-03 13:37:18 +02:00
Erni	bf0d632108	Naming and typing improvements in Actor/Critic/Policy forwards (#1032 ) Closes #917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). #1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. #1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. #1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). #1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. #1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. #1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-04-01 17:14:17 +02:00
Michael Panchenko	5bf923c9bd	Removed more references to Chinese docs [skip ci]	2024-03-28 18:17:25 +01:00
Michael Panchenko	23a33a9aa3	Removed link to Chinese docs [skip ci]	2024-03-28 18:13:15 +01:00
Michael Panchenko	ecb272c61b	Update CHANGELOG.md [skip ci]	2024-03-28 18:06:00 +01:00
bordeauxred	4f65b131aa	Feat/refactor collector (#1063 ) Closes: #1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063 - `Collector`s can now be closed, and their reset is more granular. #1063 - Trainers can control whether collectors should be reset prior to training. #1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063 - Improved typing for `exploration_noise` and within Collector. #1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. #1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. #1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2024-03-28 18:02:31 +01:00
maxhuettenrauch	edae9e4403	fixed env seeding in test_sac_with_il.py (#1081 )	2024-03-28 12:52:35 +01:00
Michael Panchenko	61bf9adaff	Update CHANGELOG.md [skip ci]	2024-03-20 23:09:26 +01:00
Michael Panchenko	5f96a57bbb	Add CHANGELOG.md	2024-03-20 23:08:34 +01:00
Michael Panchenko	1a4d7deca6	Update publish.yaml, typo [skip ci[ v1.0.0	2024-03-20 00:41:46 +01:00
Michael Panchenko	72df9a580d	Update publish.yaml [skip ci]	2024-03-20 00:41:17 +01:00
Michael Panchenko	55e9bee373	Update publish.yaml [skip ci]	2024-03-20 00:39:54 +01:00
Michael Panchenko	e3661c11e3	Update publish.yaml, missing / [skip ci]	2024-03-20 00:26:11 +01:00
maxhuettenrauch	e82379c47f	Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072 ) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-03-14 11:07:56 +01:00
Dominik Jain	1714c7f2c7	High-level API: Fix number of test episodes being incorrectly scaled by number of envs (#1071 )	2024-03-07 08:57:11 -08:00
Michael Panchenko	6746a80f6d	Add publish workflow, first preparation for next release (#1067 )	2024-03-04 12:21:49 +01:00
Michael Panchenko	fdb69f1273	Improve README, minor changes in procedural example (#1068 )	2024-03-03 15:07:07 +01:00
Dominik Jain	b6b2c95ac7	Improve README, minor changes in procedural example	2024-03-03 15:06:40 +01:00
Erni	1aee41fa9c	Using dist.mode instead of logits.argmax (#1066 ) changed all the occurrences where an action is selected deterministically - from: using the outputs of the actor network. - to: using the mode of the PyTorch distribution. --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>	2024-03-03 00:09:39 +01:00

1 2 3 4 5 ...

683 Commits