669 Commits

Author SHA1 Message Date
Dominik Jain
d18ded333e CriticFactoryReuseActor: Fix the case where we want to reuse an actor's
preprocessing network for the critic (must be applied before concatenating
  the actions)
2024-04-29 18:27:02 +02:00
Dominik Jain
0b494845c9 continuous.Critic: Add flag apply_preprocess_net_to_obs_only to allow the
preprocessing network to be applied to the observations only (without
  the actions concatenated), which is essential for the case where we want
  to reuse the actor's preprocessing network
2024-04-29 18:27:02 +02:00
Dominik Jain
18ed981875 Add pickle/serialisation utils: setstate and getstate 2024-04-29 18:27:02 +02:00
Dominik Jain
be1c8cd235 DQN:
* Fix input validation
  * Fix output_dim not being set if features_only=True and output_dim_added_layer not None
2024-04-29 13:37:26 +02:00
Michael Panchenko
081adedc32
Changelog + dependabot bumps (#1124)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-25 08:49:54 -07:00
maxhuettenrauch
ade85ab32b
Feature/algo eval (#1074)
# Changes

## Dependencies

- New extra "eval"

## Api Extension
- `Experiment` and `ExperimentConfig` now have a `name`, that can
however be overridden when `Experiment.run()` is called
- When building an `Experiment` from an `ExperimentConfig`, the user has
the option to add info about seeds to the name.
- New method in `ExperimentConfig` called
`build_default_seeded_experiments`
- `SamplingConfig` has an explicit training seed, `test_seed` is
inferred.
- New `evaluation` package for repeating the same experiment with
multiple seeds and aggregating the results (important extension!).
Currently in alpha state.
- Loggers can now restore the logged data into python by using the new
`restore_logged_data`

## Breaking Changes
- `AtariEnvFactory` (in examples) now receives explicit train and test
seeds
- `EnvFactoryRegistered` now requires an explicit `test_seed`
- `BaseLogger.prepare_dict_for_logging` is now abstract

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-04-20 23:25:33 +00:00
maxhuettenrauch
9c0b3e7292
use explicit multiprocessing context for creating Pipe in subproc.py (#1102) 2024-04-19 11:08:53 +02:00
maxhuettenrauch
a043711c10
Fix/deterministic action space sampling in SubprocVectorEnv (#1103) 2024-04-18 16:16:57 +02:00
Daniel Plop
6935a111d9
Add non in-place version of Batch.to_torch (#1117)
Closes: https://github.com/aai-institute/tianshou/issues/1116

### API Extensions

- Batch received new method: `to_torch_`. #1117

### Breaking Changes

- The method `to_torch` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_torch_` does the conversion in-place.
#1117
2024-04-17 22:07:24 +02:00
Daniel Plop
ca4f74f40e
Allow two (same/different) Batch objs to be tested for equality (#1098)
Closes: https://github.com/thu-ml/tianshou/issues/1086

### Api Extensions

- Batch received new method: `to_numpy_`. #1098
- `to_dict` in Batch supports also non-recursive conversion. #1098
- Batch `__eq__` now implemented, semantic equality check of batches is
now possible. #1098

### Breaking Changes

- The method `to_numpy` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_numpy_` does the conversion in-place.
#1098
2024-04-16 18:12:48 +02:00
Michael Panchenko
049907d9ab Fix type check in atari wrapper, solves #1111 2024-04-16 10:52:48 +02:00
maxhuettenrauch
60d1ba1c8f
Fix/reset before collect in procedural examples, tests and hl experiment (#1100)
Needed due to a breaking change in the Collector which was overlooked in some of the examples
2024-04-16 10:30:21 +02:00
Molasses
766f6fedf2
Fix imports in Readme 2024-04-15 11:32:35 +02:00
Erni
e2a2a6856d
Changed .keys() to get_keys() in batch class (#1105)
Solves the inconsistency that iter(Batch) is not the same as Batch.keys() by "deprecating" the implicit .keys() method

Closes: #922
2024-04-12 12:15:37 +02:00
Michael Panchenko
03e9af04b7
Update README.md (removed instability warning) [skip ci] 2024-04-05 12:05:20 +02:00
Michael Panchenko
bab5c634e7
Missing link in README.md [skip ci] 2024-04-05 12:04:27 +02:00
Daniel Plop
8a0629ded6
Fix mypy issues in tests and examples (#1077)
Closes #952 

- `SamplingConfig` supports `batch_size=None`. #1077
- tests and examples are covered by `mypy`. #1077
- `NetBase` is more used, stricter typing by making it generic. #1077
- `utils.net.common.Recurrent` now receives and returns a
`RecurrentStateBatch` instead of a dict. #1077

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2024-04-03 18:07:51 +02:00
Michael Panchenko
55fa6f7f35
Don't raise error on len of empty Batch (#1084) 2024-04-03 13:37:18 +02:00
Erni
bf0d632108
Naming and typing improvements in Actor/Critic/Policy forwards (#1032)
Closes #917 

### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032

### Breaking Changes

- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2024-04-01 17:14:17 +02:00
Michael Panchenko
5bf923c9bd Removed more references to Chinese docs [skip ci] 2024-03-28 18:17:25 +01:00
Michael Panchenko
23a33a9aa3 Removed link to Chinese docs [skip ci] 2024-03-28 18:13:15 +01:00
Michael Panchenko
ecb272c61b
Update CHANGELOG.md [skip ci] 2024-03-28 18:06:00 +01:00
bordeauxred
4f65b131aa
Feat/refactor collector (#1063)
Closes: #1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
#1063
- `Collector`s can now be closed, and their reset is more granular.
#1063
- Trainers can control whether collectors should be reset prior to
training. #1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. #1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. #1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). #1063
- Improved typing for `exploration_noise` and within Collector. #1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . #1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. #1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. #1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2024-03-28 18:02:31 +01:00
maxhuettenrauch
edae9e4403
fixed env seeding in test_sac_with_il.py (#1081) 2024-03-28 12:52:35 +01:00
Michael Panchenko
61bf9adaff
Update CHANGELOG.md [skip ci] 2024-03-20 23:09:26 +01:00
Michael Panchenko
5f96a57bbb
Add CHANGELOG.md 2024-03-20 23:08:34 +01:00
Michael Panchenko
1a4d7deca6
Update publish.yaml, typo [skip ci[ v1.0.0 2024-03-20 00:41:46 +01:00
Michael Panchenko
72df9a580d
Update publish.yaml [skip ci] 2024-03-20 00:41:17 +01:00
Michael Panchenko
55e9bee373
Update publish.yaml [skip ci] 2024-03-20 00:39:54 +01:00
Michael Panchenko
e3661c11e3
Update publish.yaml, missing / [skip ci] 2024-03-20 00:26:11 +01:00
maxhuettenrauch
e82379c47f
Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072)
Running multiple training runs in parallel (with, for example, joblib)
fails on macOS due to a change in the standard context for
multiprocessing (see
[here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing)
or
[here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)).
This PR adds the ability to explicitly set a multiprocessing context for
the SubProcEnvWorker (similar to gymnasium's
[AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)).
---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-03-14 11:07:56 +01:00
Dominik Jain
1714c7f2c7
High-level API: Fix number of test episodes being incorrectly scaled by number of envs (#1071) 2024-03-07 08:57:11 -08:00
Michael Panchenko
6746a80f6d
Add publish workflow, first preparation for next release (#1067) 2024-03-04 12:21:49 +01:00
Michael Panchenko
fdb69f1273
Improve README, minor changes in procedural example (#1068) 2024-03-03 15:07:07 +01:00
Dominik Jain
b6b2c95ac7 Improve README, minor changes in procedural example 2024-03-03 15:06:40 +01:00
Erni
1aee41fa9c
Using dist.mode instead of logits.argmax (#1066)
changed all the occurrences where an action is selected deterministically

- **from**: using the outputs of the actor network.
- **to**: using the mode of the PyTorch distribution.

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
2024-03-03 00:09:39 +01:00
maxhuettenrauch
7c970df53f
Fix/add watch env with obs rms (#1061)
Supports deciding whether to watch the agent performing on the env using high-level interfaces
2024-02-29 15:59:11 +01:00
Dominik Jain
49781e715e
Fix high-level examples (#1060)
The high-level examples were all broken by changes made to make mypy
pass.
This PR fixes them, making a type change in logging.run_cli instead to
make mypy happy.
2024-02-23 23:17:14 +01:00
Ashok Arora
0b61bf8caf
Fix the link to the contributing guide. (#1062) 2024-02-23 23:15:41 +01:00
Carlo Cagnetta
ce371ae736
remove old python versions from poetry classifier (#1059) 2024-02-21 15:27:53 +01:00
Michael Panchenko
9b6cb6903e
Improvements in High-Level API and Poe Tasks (#1055)
* Add an option to SamplingConfig which allows to configure number of
test episodes
* Make OptimizerFactory more flexible, adding method
`create_optimizer_for_params`
* Fix AutoAlphaFactoryDefault using hard-coded Adam optimizer
* Fix mypy issues that were platform/installation-dependent
* Limit scope of nbqa, resolving issues with files generated by old
versions of the build

Fixes #1054
2024-02-15 12:02:16 +01:00
Dominik Jain
26e210a6ae Apply nbqa only to the docs/ folder and exclude the (old) jupyter_execute folder 2024-02-15 11:39:45 +01:00
Dominik Jain
08728ad35e Resolve platform-specific/installation-specific mypy issues
by adding ignores and ignoring unused ignores locally
2024-02-15 11:26:54 +01:00
Dominik Jain
f2e0fd165d Fix gitignore applying to tianshou/env on platfoms with case-insensitive file system 2024-02-15 11:26:39 +01:00
Dominik Jain
eeb2081ca6 Fix AutoAlphaFactoryDefault using hard-coded Adam optimizer instead of passed factory 2024-02-14 20:43:38 +01:00
Dominik Jain
76cbd7efc2 Make OptimizerFactory more flexible by adding a second method which
allows the creation of an optimizer given arbitrary parameters
(rather than a module)
2024-02-14 20:42:06 +01:00
Dominik Jain
bf391853dc Allow to configure number of test episodes in high-level API 2024-02-14 19:14:28 +01:00
Michael Panchenko
8742e3645c
Docs, js - typo in path 2024-02-14 10:50:06 +01:00
Michael Panchenko
5cc51145da
Docs/hotfix (#1052) 2024-02-12 18:54:38 +01:00
Michael Panchenko
7a30b842b6
Add vega scripts explictly to config (#1051) 2024-02-12 18:49:32 +01:00