707 Commits

Author SHA1 Message Date
Dominik Jain
024b80e79c Improve creation of multiple seeded experiments:
* Add class ExperimentCollection to improve usability
  * Remove parameters from ExperimentBuilder.build
  * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
    changing the return type to ExperimentCollection
  * Replace temp_config_mutation (which was not appropriate for the public API) with
    method copy (which performs a safe deep copy)
2024-05-05 22:27:19 +02:00
Dominik Jain
35779696ee Clean up handling of an Experiment's name (and, by extension, a run's name) 2024-05-05 22:27:19 +02:00
Michael Panchenko
a8e9df31f7 Bugfix: allow for training_stat to be None instead of asserting not-None 2024-05-05 22:27:19 +02:00
Michael Panchenko
4e38aeb829 Merge branch 'refs/heads/thuml-master' into policy-train-eval
# Conflicts:
#	CHANGELOG.md
2024-05-05 16:03:34 +02:00
Michael Panchenko
82f425e9fe Collector: move @override, removed docstrings from overridden methods 2024-05-05 16:01:52 +02:00
Michael Panchenko
26a6cca76e Improved docstrings, added asserts to make mypy happy 2024-05-05 15:56:06 +02:00
Michael Panchenko
c5d0e169b5 Collector: removed unnecessary no-grad flag from interfaces. Breaking 2024-05-05 15:41:20 +02:00
Michael Panchenko
f876198870 Formatting 2024-05-05 15:16:53 +02:00
Michael Panchenko
6927eadaa7 BatchPolicy: check that self.is_within_training_step is True on update 2024-05-05 15:14:59 +02:00
dependabot[bot]
2f2d5cb210
Bump tqdm from 4.66.1 to 4.66.3 (#1134) 2024-05-05 15:01:46 +02:00
Dominik Jain
c35be8d07e Establish backward compatibility by implementing __setstate__ 2024-05-03 15:18:39 +02:00
Dominik Jain
ca69e79b4a Change the way in which deterministic evaluation is controlled:
* Remove flag `eval_mode` from Collector.collect
  * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages)
    and set it appropriately in BaseTrainer
2024-05-03 15:18:39 +02:00
Dominik Jain
18f236167f Fix invalid kwarg 2024-05-03 10:12:41 +02:00
Dominik Jain
ca4dad1139 BaseTrainer: Refactoring
New method training_step, which
    * collects training data (method _collect_training_data)
    * performs "test in train" (method _test_in_train)
    * performs policy update
  The old method named train_step performed only the first two points
  and was now split into two separate methods
2024-05-03 10:12:35 +02:00
Dominik Jain
4f16494609 Set torch train mode in BasePolicy.update instead of in each .learn implementation,
as this is less prone to errors
2024-05-02 11:51:08 +02:00
bordeauxred
f31a91df5d
Typo docstring (#1132) 2024-05-01 08:59:00 +02:00
bordeauxred
61426acf07
Improve the documentation of compute_episodic_return in base policy. (#1130) 2024-04-30 14:40:16 +02:00
Michael Panchenko
a65920fc68
Support Actor preprocessing network reuse for continuous case, fixes in DQN network (#1128)
This PR fixes a bug in DQN and lifts a limination in reusing the actor's
preprocessing network for continuous environments.

* `atari_network.DQN`:
  * Fix input validation
* Fix output_dim not being set if features_only=True and
output_dim_added_layer not None
* `continuous.Critic`: 
     * Add flag `apply_preprocess_net_to_obs_only` to allow the
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
        to reuse the actor's preprocessing network
* CriticFactoryReuseActor: Use the flag, fixing the case where we want
to reuse an actor's
preprocessing network for the critic (must be applied before
concatenating
      the actions)
* Minor improvements in docs/docstrings
2024-04-29 23:49:52 +02:00
Dominik Jain
40f772493e Update change log with changes from #1128 2024-04-29 22:30:54 +02:00
Dominik Jain
83083924df Mention CHANGELOG.md in PR template 2024-04-29 22:14:36 +02:00
Dominik Jain
8ac6bf5fbb Improve docstrings 2024-04-29 18:27:02 +02:00
Dominik Jain
250a129cc4 SamplingConfig: Improve docstrings of replay_buffer_save_only_last_obs, replay_buffer_stack_num 2024-04-29 18:27:02 +02:00
Dominik Jain
74737416ff Fix typo 2024-04-29 18:27:02 +02:00
Dominik Jain
d18ded333e CriticFactoryReuseActor: Fix the case where we want to reuse an actor's
preprocessing network for the critic (must be applied before concatenating
  the actions)
2024-04-29 18:27:02 +02:00
Dominik Jain
0b494845c9 continuous.Critic: Add flag apply_preprocess_net_to_obs_only to allow the
preprocessing network to be applied to the observations only (without
  the actions concatenated), which is essential for the case where we want
  to reuse the actor's preprocessing network
2024-04-29 18:27:02 +02:00
Dominik Jain
18ed981875 Add pickle/serialisation utils: setstate and getstate 2024-04-29 18:27:02 +02:00
Dominik Jain
be1c8cd235 DQN:
* Fix input validation
  * Fix output_dim not being set if features_only=True and output_dim_added_layer not None
2024-04-29 13:37:26 +02:00
Michael Panchenko
a2b9d7c7d8 Changelog [skip-ci] 2024-04-26 18:31:02 +02:00
Michael Panchenko
45922712d9 Dosctring add return [skip-ci] 2024-04-26 18:14:20 +02:00
Michael Panchenko
e2e8a699ea Changelog [skip-ci] 2024-04-26 18:11:23 +02:00
Michael Panchenko
6aa33b1bfe Formatting 2024-04-26 17:54:14 +02:00
Michael Panchenko
c28508b3be Changelog 2024-04-26 17:53:34 +02:00
Michael Panchenko
2eaf1f37c2 Use the new BaseCollector interface for annotations 2024-04-26 17:53:27 +02:00
Michael Panchenko
07a97c7d93 Merge branch 'refs/heads/thuml-master' into policy-train-eval 2024-04-26 17:44:57 +02:00
Michael Panchenko
69f07a8f12 Tests: fixed typing issues by declaring union types and no longer reusing var names 2024-04-26 17:39:31 +02:00
Michael Panchenko
4b619c51ba Collector: extracted interface BaseCollector, minor simplifications
Renamed is_eval kwarg
2024-04-26 17:39:31 +02:00
Michael Panchenko
12d4262f80 Tests: removed all instances of if __name__ == ... in tests
A test is not a script and should not be used as such

Also marked pistonball test as skipped since it doesn't actually test anything
2024-04-26 17:39:30 +02:00
Michael Panchenko
7d59302095 Added in_eval/in_train mode contextmanager 2024-04-26 17:39:30 +02:00
Michael Panchenko
829fd9c7a5 Deleted long deprecated functionality, removed unused warning module
There's better ways to deal with deprecations that we shall use in the future
2024-04-26 14:42:44 +02:00
Michael Panchenko
081adedc32
Changelog + dependabot bumps (#1124)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-25 08:49:54 -07:00
Maximilian Huettenrauch
49c750fb09 update tests 2024-04-24 17:06:59 +02:00
Maximilian Huettenrauch
8cb17de190 update examples 2024-04-24 17:06:54 +02:00
Maximilian Huettenrauch
e499bed8b0 add is_eval attribute to policy and set this attribute as well as train mode in appropriate places 2024-04-24 17:06:42 +02:00
maxhuettenrauch
ade85ab32b
Feature/algo eval (#1074)
# Changes

## Dependencies

- New extra "eval"

## Api Extension
- `Experiment` and `ExperimentConfig` now have a `name`, that can
however be overridden when `Experiment.run()` is called
- When building an `Experiment` from an `ExperimentConfig`, the user has
the option to add info about seeds to the name.
- New method in `ExperimentConfig` called
`build_default_seeded_experiments`
- `SamplingConfig` has an explicit training seed, `test_seed` is
inferred.
- New `evaluation` package for repeating the same experiment with
multiple seeds and aggregating the results (important extension!).
Currently in alpha state.
- Loggers can now restore the logged data into python by using the new
`restore_logged_data`

## Breaking Changes
- `AtariEnvFactory` (in examples) now receives explicit train and test
seeds
- `EnvFactoryRegistered` now requires an explicit `test_seed`
- `BaseLogger.prepare_dict_for_logging` is now abstract

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-04-20 23:25:33 +00:00
maxhuettenrauch
9c0b3e7292
use explicit multiprocessing context for creating Pipe in subproc.py (#1102) 2024-04-19 11:08:53 +02:00
maxhuettenrauch
a043711c10
Fix/deterministic action space sampling in SubprocVectorEnv (#1103) 2024-04-18 16:16:57 +02:00
Daniel Plop
6935a111d9
Add non in-place version of Batch.to_torch (#1117)
Closes: https://github.com/aai-institute/tianshou/issues/1116

### API Extensions

- Batch received new method: `to_torch_`. #1117

### Breaking Changes

- The method `to_torch` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_torch_` does the conversion in-place.
#1117
2024-04-17 22:07:24 +02:00
Daniel Plop
ca4f74f40e
Allow two (same/different) Batch objs to be tested for equality (#1098)
Closes: https://github.com/thu-ml/tianshou/issues/1086

### Api Extensions

- Batch received new method: `to_numpy_`. #1098
- `to_dict` in Batch supports also non-recursive conversion. #1098
- Batch `__eq__` now implemented, semantic equality check of batches is
now possible. #1098

### Breaking Changes

- The method `to_numpy` in `data.utils.batch.Batch` is not in-place
anymore. Instead, a new method `to_numpy_` does the conversion in-place.
#1098
2024-04-16 18:12:48 +02:00
Michael Panchenko
049907d9ab Fix type check in atari wrapper, solves #1111 2024-04-16 10:52:48 +02:00
maxhuettenrauch
60d1ba1c8f
Fix/reset before collect in procedural examples, tests and hl experiment (#1100)
Needed due to a breaking change in the Collector which was overlooked in some of the examples
2024-04-16 10:30:21 +02:00