191 Commits

Author SHA1 Message Date
Dominik Jain
ca69e79b4a Change the way in which deterministic evaluation is controlled:
* Remove flag `eval_mode` from Collector.collect
  * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages)
    and set it appropriately in BaseTrainer
2024-05-03 15:18:39 +02:00
Dominik Jain
18f236167f Fix invalid kwarg 2024-05-03 10:12:41 +02:00
Michael Panchenko
4b619c51ba Collector: extracted interface BaseCollector, minor simplifications
Renamed is_eval kwarg
2024-04-26 17:39:31 +02:00
Maximilian Huettenrauch
8cb17de190 update examples 2024-04-24 17:06:54 +02:00
maxhuettenrauch
ade85ab32b
Feature/algo eval (#1074)
# Changes

## Dependencies

- New extra "eval"

## Api Extension
- `Experiment` and `ExperimentConfig` now have a `name`, that can
however be overridden when `Experiment.run()` is called
- When building an `Experiment` from an `ExperimentConfig`, the user has
the option to add info about seeds to the name.
- New method in `ExperimentConfig` called
`build_default_seeded_experiments`
- `SamplingConfig` has an explicit training seed, `test_seed` is
inferred.
- New `evaluation` package for repeating the same experiment with
multiple seeds and aggregating the results (important extension!).
Currently in alpha state.
- Loggers can now restore the logged data into python by using the new
`restore_logged_data`

## Breaking Changes
- `AtariEnvFactory` (in examples) now receives explicit train and test
seeds
- `EnvFactoryRegistered` now requires an explicit `test_seed`
- `BaseLogger.prepare_dict_for_logging` is now abstract

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-04-20 23:25:33 +00:00
Michael Panchenko
049907d9ab Fix type check in atari wrapper, solves #1111 2024-04-16 10:52:48 +02:00
maxhuettenrauch
60d1ba1c8f
Fix/reset before collect in procedural examples, tests and hl experiment (#1100)
Needed due to a breaking change in the Collector which was overlooked in some of the examples
2024-04-16 10:30:21 +02:00
Daniel Plop
8a0629ded6
Fix mypy issues in tests and examples (#1077)
Closes #952 

- `SamplingConfig` supports `batch_size=None`. #1077
- tests and examples are covered by `mypy`. #1077
- `NetBase` is more used, stricter typing by making it generic. #1077
- `utils.net.common.Recurrent` now receives and returns a
`RecurrentStateBatch` instead of a dict. #1077

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2024-04-03 18:07:51 +02:00
Erni
bf0d632108
Naming and typing improvements in Actor/Critic/Policy forwards (#1032)
Closes #917 

### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032

### Breaking Changes

- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2024-04-01 17:14:17 +02:00
maxhuettenrauch
e82379c47f
Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072)
Running multiple training runs in parallel (with, for example, joblib)
fails on macOS due to a change in the standard context for
multiprocessing (see
[here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing)
or
[here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)).
This PR adds the ability to explicitly set a multiprocessing context for
the SubProcEnvWorker (similar to gymnasium's
[AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)).
---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-03-14 11:07:56 +01:00
Dominik Jain
b6b2c95ac7 Improve README, minor changes in procedural example 2024-03-03 15:06:40 +01:00
maxhuettenrauch
7c970df53f
Fix/add watch env with obs rms (#1061)
Supports deciding whether to watch the agent performing on the env using high-level interfaces
2024-02-29 15:59:11 +01:00
Dominik Jain
49781e715e
Fix high-level examples (#1060)
The high-level examples were all broken by changes made to make mypy
pass.
This PR fixes them, making a type change in logging.run_cli instead to
make mypy happy.
2024-02-23 23:17:14 +01:00
Michael Panchenko
33d241a29b
Docs/html doc issues (#1048)
Closes #1005 

## Main changes

2. Load vega-embed things using jupyter-book config 
3. Add vega-embed dependencies as part of local code for offline
development
4. Reduced duplication in benchmark.js
5. Update sphinx, docutils, and jupyter-book

Co-authored-by: carlocagnetta <c.cagnetta@appliedai.de>
2024-02-09 19:43:10 +01:00
maxhuettenrauch
5fe9aea798
Update and fix dependencies related to mac install (#1044)
Addresses part of #1015 

### Dependencies

- move jsonargparse and docstring-parser to dependencies to run hl
examples without dev
- create mujoco-py extra for legacy mujoco envs
- updated atari extra
    - removed atari-py and gym dependencies
    - added ALE-py, autorom, and shimmy
- created robotics extra for HER-DDPG

### Mac specific

- only install envpool when not on mac
- mujoco-py not working on macOS newer than Monterey
(https://github.com/openai/mujoco-py/issues/777)
- D4RL also fails due to dependency on mujoco-py
(https://github.com/Farama-Foundation/D4RL/issues/232)

### Other

- reduced training-num/test-num in example files to a number ≤ 20
(examples with 100 led to too many open files)
- rendering for Mujoco envs needs to be fixed on gymnasium side
(https://github.com/Farama-Foundation/Gymnasium/issues/749)

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>
2024-02-06 17:06:38 +01:00
Daniel Plop
eb0215cf76
Refactoring/mypy issues test (#1017)
Improves typing in examples and tests, towards mypy passing there.

Introduces the SpaceInfo utility
2024-02-06 14:24:30 +01:00
Dominik Jain
022cfb7f78 Cleaned up handling of output_dim retrieval, adding exceptions for erroneous cases 2024-01-16 14:52:31 +01:00
Dominik Jain
20074931d5 Improve docstrings 2024-01-16 14:52:31 +01:00
Dominik Jain
05a8cf4e74 Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered 2024-01-16 14:52:31 +01:00
Dominik Jain
c9cb41bf55 Make envpool usage configuration more explicit 2024-01-16 14:52:31 +01:00
Dominik Jain
62d58faa02 Add example from README (with minor updates) 2024-01-16 13:43:14 +01:00
Dominik Jain
8d6df2b276 Add high-level discrete example (CartPole) for README 2024-01-12 17:13:50 +01:00
Dominik Jain
1e5ebc2a2d Improve naming of callback classes and related methods/attributes
Add EpochStopCallbackRewardThreshold
2024-01-12 17:13:42 +01:00
Dominik Jain
ff398beed9 Move callbacks for setting DQN epsilon values to the library 2024-01-12 17:13:42 +01:00
Dominik Jain
63269fe198 Implement make_atari_env via AtariEnvFactory, eliminating duplication 2024-01-12 17:13:42 +01:00
Dominik Jain
19a98c3b2a Fix models using scale_obs not being persistable (due to locally defined class) 2024-01-12 17:13:42 +01:00
Dominik Jain
7fa588309b Update MuJoCo examples to use Ant-v4 instead of Ant-v3 2024-01-12 17:13:42 +01:00
Dominik Jain
eaab7b0a4b Improve environment factory abstractions in high-level API:
* EnvFactory now uses the creation of a single environment as
   the basic functionality which the more high-level functions build
   upon
 * Introduce enum EnvMode to indicate the purpose for which an env
   is created, allowing the factory creation process to change its
   behaviour accordingly
 * Add EnvFactoryGymnasium to provide direct support for envs that
   can be created via gymnasium.make
     - EnvPool is supported via an injectible EnvPoolFactory
     - Existing EnvFactory implementations are now derived from
       EnvFactoryGymnasium
 * Use a separate environment (which uses new EnvMode.WATCH) for
   watching agent performance after training (instead of using test
   environments, which the user may want to configure differently)
2024-01-12 17:13:42 +01:00
maxhuettenrauch
522f7fbf98
Feature/dataclasses (#996)
This PR adds strict typing to the output of `update` and `learn` in all
policies. This will likely be the last large refactoring PR before the
next release (0.6.0, not 1.0.0), so it requires some attention. Several
difficulties were encountered on the path to that goal:

1. The policy hierarchy is actually "broken" in the sense that the keys
of dicts that were output by `learn` did not follow the same enhancement
(inheritance) pattern as the policies. This is a real problem and should
be addressed in the near future. Generally, several aspects of the
policy design and hierarchy might deserve a dedicated discussion.
2. Each policy needs to be generic in the stats return type, because one
might want to extend it at some point and then also extend the stats.
Even within the source code base this pattern is necessary in many
places.
3. The interaction between learn and update is a bit quirky, we
currently handle it by having update modify special field inside
TrainingStats, whereas all other fields are handled by learn.
4. The IQM module is a policy wrapper and required a
TrainingStatsWrapper. The latter relies on a bunch of black magic.

They were addressed by:
1. Live with the broken hierarchy, which is now made visible by bounds
in generics. We use type: ignore where appropriate.
2. Make all policies generic with bounds following the policy
inheritance hierarchy (which is incorrect, see above). We experimented a
bit with nested TrainingStats classes, but that seemed to add more
complexity and be harder to understand. Unfortunately, mypy thinks that
the code below is wrong, wherefore we have to add `type: ignore` to the
return of each `learn`

```python

T = TypeVar("T", bound=int)


def f() -> T:
  return 3
```

3. See above
4. Write representative tests for the `TrainingStatsWrapper`. Still, the
black magic might cause nasty surprises down the line (I am not proud of
it)...

Closes #933

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2023-12-30 11:09:03 +01:00
Michael Panchenko
8d3d1f164b
Support batch_size=None and use it in various scripts (#993)
Closes #986
2023-11-24 10:13:10 -08:00
Michael Panchenko
3a1bc18add
Method to compute actions from observations (#991)
This PR adds a new method for getting actions from an env's observation
and info. This is useful for standard inference and stands in contrast
to batch-based methods that are currently used in training and
evaluation. Without this, users have to do some kind of gymnastics to
actually perform inference with a trained policy. I have also added a
test for the new method.

In future PRs, this method should be included in the examples (in the
the "watch" section).

To add this required improving multiple typing things and, importantly,
_simplifying the signature of `forward` in many policies!_ This is a
**breaking change**, but it will likely affect no users. The `input`
parameter of forward was a rather hacky mechanism, I believe it is good
that it's gone now. It will also help with #948 .

The main functional change is the addition of `compute_action` to
`BasePolicy`.

Other minor changes:
- improvements in typing
- updated PR and Issue templates
- Improved handling of `max_action_num`

Closes #981
2023-11-16 17:27:53 +00:00
Dominik Jain
dae4000cd2 Revert "Depend on sensAI instead of copying its utils (logging, string)"
This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.
2023-11-08 19:11:39 +01:00
Dominik Jain
fdb0eba93d Depend on sensAI instead of copying its utils (logging, string) 2023-10-27 20:15:58 +02:00
Dominik Jain
c613557740 Apply datetime_tag() in high-level examples 2023-10-26 12:50:08 +02:00
Dominik Jain
da2194eff6 Force kwargs in PolicyWrapperFactoryIntrinsicCuriosity init 2023-10-26 10:43:59 +02:00
Dominik Jain
dd4a0eb430 Fix: Add MujocoEnvObsRmsPersistence only if obs_norm is enabled 2023-10-24 13:52:30 +02:00
Dominik Jain
b5a891557f Revert to simplified environment factory, removing unnecessary config object
(configuration shall be part of the factory instance)
2023-10-24 13:14:23 +02:00
Dominik Jain
7437131d79 Fix tianshou.highlevel depending on jsonargparse
(should be dev dependency only) by introducing a new
place where jsonargparse can be configured:
logging.run_cli, which is also slightly more convenient
2023-10-19 11:40:49 +02:00
Dominik Jain
6cbee188b8 Change interface of EnvFactory to ensure that configuration
of number of environments in SamplingConfig is used
(values are now passed to factory method)

This is clearer and removes the need to pass otherwise
unnecessary configuration to environment factories at
construction
2023-10-19 11:37:20 +02:00
Dominik Jain
41bd463a7b Allow to configure activation function in default networks
* Set ReLU as default in all actor and critic factories
* Configure non-default in applicable MuJoCo examples
2023-10-18 20:44:18 +02:00
Dominik Jain
ed06ab7ff0 Handle obs_norm setting in MuJoCo envs 2023-10-18 20:44:18 +02:00
Dominik Jain
d84e936430 Apply centrally defined callbacks 2023-10-18 20:44:18 +02:00
Dominik Jain
ae4850692f DQNExperimentBuilder: Use IntermediateModuleFactory instead of ActorFactory
(similar to IQN implementation)
2023-10-18 20:44:18 +02:00
Dominik Jain
83048788a1 Add generalised DQN network representation, adding specialised class for feature_only=True 2023-10-18 20:44:18 +02:00
Dominik Jain
4b270eaa2d Add documentation, improve structure of 'module' package 2023-10-18 20:44:18 +02:00
Dominik Jain
76e870207d Improve persistence handling
* Add persistence/restoration of Experiment instance
* Add file logging in experiment
* Allow all persistence/logging to be disabled
* Disable persistence in tests
2023-10-18 20:44:18 +02:00
Dominik Jain
3691ed2abc Support obs_rms persistence for MuJoCo by adding a general mechanism
for attaching persistence to Environments instances
2023-10-18 20:44:17 +02:00
Dominik Jain
686fd555b0 Extend tests, fixing some default behaviour 2023-10-18 20:44:17 +02:00
Dominik Jain
a8a367c42d Support IQN in high-level API
* Add example atari_iqn_hl
* Factor out trainer callbacks to new module atari_callbacks
* Extract base class for DQN-based agent factories
* Improved module factory interface design, achieving higher generality
2023-10-18 20:44:17 +02:00
Dominik Jain
799beb79b4 Support discrete SAC in high-level API
* Changed machanism for reusing actor's preprocessing module in critics
  to avoid special handling in AgentFactory implementations, improving
  separation of concerns:
    - Added CriticFactoryReuseActor as the new critic factory
    - Added ActorFactoryTransientStorageDecorator to pass on the actor
      data
    - Added helper classes ActorFuture, ActorFutureProviderProtocol
* Add example atari_sac_hl
2023-10-18 20:44:17 +02:00