# Changes
## Dependencies
- New extra "eval"
## Api Extension
- `Experiment` and `ExperimentConfig` now have a `name`, that can
however be overridden when `Experiment.run()` is called
- When building an `Experiment` from an `ExperimentConfig`, the user has
the option to add info about seeds to the name.
- New method in `ExperimentConfig` called
`build_default_seeded_experiments`
- `SamplingConfig` has an explicit training seed, `test_seed` is
inferred.
- New `evaluation` package for repeating the same experiment with
multiple seeds and aggregating the results (important extension!).
Currently in alpha state.
- Loggers can now restore the logged data into python by using the new
`restore_logged_data`
## Breaking Changes
- `AtariEnvFactory` (in examples) now receives explicit train and test
seeds
- `EnvFactoryRegistered` now requires an explicit `test_seed`
- `BaseLogger.prepare_dict_for_logging` is now abstract
---------
Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
Closes: https://github.com/aai-institute/tianshou/issues/1116
### API Extensions
- Batch received new method: `to_torch_`. #1117
### Breaking Changes
- The method `to_torch` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_torch_` does the conversion in-place.
#1117
Closes: https://github.com/thu-ml/tianshou/issues/1086
### Api Extensions
- Batch received new method: `to_numpy_`. #1098
- `to_dict` in Batch supports also non-recursive conversion. #1098
- Batch `__eq__` now implemented, semantic equality check of batches is
now possible. #1098
### Breaking Changes
- The method `to_numpy` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_numpy_` does the conversion in-place.
#1098
Closes#952
- `SamplingConfig` supports `batch_size=None`. #1077
- tests and examples are covered by `mypy`. #1077
- `NetBase` is more used, stricter typing by making it generic. #1077
- `utils.net.common.Recurrent` now receives and returns a
`RecurrentStateBatch` instead of a dict. #1077
---------
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes#917
### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032
### Breaking Changes
- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032
---------
Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes: #1058
### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
#1063
- `Collector`s can now be closed, and their reset is more granular.
#1063
- Trainers can control whether collectors should be reset prior to
training. #1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. #1063
### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. #1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). #1063
- Improved typing for `exploration_noise` and within Collector. #1063
### Breaking Changes
- Removed `.data` attribute from `Collector` and its child classes.
#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . #1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. #1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. #1063
---------
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes#1005
## Main changes
2. Load vega-embed things using jupyter-book config
3. Add vega-embed dependencies as part of local code for offline
development
4. Reduced duplication in benchmark.js
5. Update sphinx, docutils, and jupyter-book
Co-authored-by: carlocagnetta <c.cagnetta@appliedai.de>
- Added nbqa to pyproject.toml
- Resolved mypy issues on notebooks and related files
- Conducting ruff checks on notebooks
- Add DataclassPPrintMixin for better stats representation
- Improved Notebooks wording and explanations
Resolve: #1004
Related to #974
This PR adds strict typing to the output of `update` and `learn` in all
policies. This will likely be the last large refactoring PR before the
next release (0.6.0, not 1.0.0), so it requires some attention. Several
difficulties were encountered on the path to that goal:
1. The policy hierarchy is actually "broken" in the sense that the keys
of dicts that were output by `learn` did not follow the same enhancement
(inheritance) pattern as the policies. This is a real problem and should
be addressed in the near future. Generally, several aspects of the
policy design and hierarchy might deserve a dedicated discussion.
2. Each policy needs to be generic in the stats return type, because one
might want to extend it at some point and then also extend the stats.
Even within the source code base this pattern is necessary in many
places.
3. The interaction between learn and update is a bit quirky, we
currently handle it by having update modify special field inside
TrainingStats, whereas all other fields are handled by learn.
4. The IQM module is a policy wrapper and required a
TrainingStatsWrapper. The latter relies on a bunch of black magic.
They were addressed by:
1. Live with the broken hierarchy, which is now made visible by bounds
in generics. We use type: ignore where appropriate.
2. Make all policies generic with bounds following the policy
inheritance hierarchy (which is incorrect, see above). We experimented a
bit with nested TrainingStats classes, but that seemed to add more
complexity and be harder to understand. Unfortunately, mypy thinks that
the code below is wrong, wherefore we have to add `type: ignore` to the
return of each `learn`
```python
T = TypeVar("T", bound=int)
def f() -> T:
return 3
```
3. See above
4. Write representative tests for the `TrainingStatsWrapper`. Still, the
black magic might cause nasty surprises down the line (I am not proud of
it)...
Closes#933
---------
Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
- [X] I have added the correct label(s) to this Pull Request or linked
the relevant issue(s)
- [X] I have provided a description of the changes in this Pull Request
- [X] I have added documentation for my changes
- [ ] If applicable, I have added tests to cover my changes.
- [X] I have reformatted the code using `poe format`
- [X] I have checked style and types with `poe lint` and `poe
type-check`
- [ ] (Optional) I ran tests locally with `poe test`
(or a subset of them with `poe test-reduced`) ,and they pass
- [X] (Optional) I have tested that documentation builds correctly with
`poe doc-build`
Changes in this PR (see individual commits):
* Fix: SamplingConfig.start_timesteps_random was not used
* Environments: Support use of different test environment factory in
convenience constructors `from_factory*`
* SamplingConfig: Improve/extend docstrings, clearly explaining the
parameters
* SamplingConfig: Change default of repeat_per_collect to 1
* Improve logging
* Fix doc-build on Windows