Closes#917
### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032
### Breaking Changes
- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032
---------
Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
* Add example atari_iqn_hl
* Factor out trainer callbacks to new module atari_callbacks
* Extract base class for DQN-based agent factories
* Improved module factory interface design, achieving higher generality
* Changed machanism for reusing actor's preprocessing module in critics
to avoid special handling in AgentFactory implementations, improving
separation of concerns:
- Added CriticFactoryReuseActor as the new critic factory
- Added ActorFactoryTransientStorageDecorator to pass on the actor
data
- Added helper classes ActorFuture, ActorFutureProviderProtocol
* Add example atari_sac_hl
* Implement example mujoco_redq_hl
* Add abstraction CriticEnsembleFactory with default implementations
to suit REDQ
* Fix type annotation of linear_layer in Net, MLP, Critic
(was incompatible with REDQ usage)
* Allow to specify trainer callbacks (train_fn, test_fn, stop_fn)
in high-level API, adding the necessary abstractions and pass-on
mechanisms
* Add example atari_dqn_hl
* Add common based class for A2C and PPO agent factories
* Add default for dist_fn parameter, adding corresponding factories
* Add example mujoco_a2c_hl
* Use prefix convention (subclasses have superclass names as prefix) to
facilitate discoverability of relevant classes via IDE autocompletion
* Use dual naming, adding an alternative concise name that omits the
precise OO semantics and retains only the essential part of the name
(which can be more pleasing to users not accustomed to
convoluted OO naming)
* Created mixins for agent factories to reduce code duplication
* Further factorised params & mixins for experiment factories
* Additional parameter abstractions
* Implement high-level MuJoCo TD3 example