Naming and typing improvements in Actor/Critic/Policy forwards (#1032 )

Closes #917 

### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032

### Breaking Changes

- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>

2024-04-01 17:14:17 +02:00

2.0 KiB

Raw Blame History

Changelog

Release 1.1.0

Api Extensions

Batch received two new methods: to_dict and to_list_of_dicts. #1063
Collectors can now be closed, and their reset is more granular. #1063
Trainers can control whether collectors should be reset prior to training. #1063
Convenience constructor for CollectStats called with_autogenerated_stats. #1063

Internal Improvements

Collectors rely less on state, the few stateful things are stored explicitly instead of through a .data attribute. #1063
Introduced a first iteration of a naming convention for vars in Collectors. #1063
Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
Improved typing for exploration_noise and within Collector. #1063
Better variable names related to model outputs (logits, dist input etc.). #1032
Improved typing for actors and critics, using Tianshou classes like Actor, ActorProb, etc., instead of just nn.Module. #1032
Added interfaces for most Actor and Critic classes to enforce the presence of forward methods. #1032
Simplified PGPolicy forward by unifying the dist_fn interface (see associated breaking change). #1032
Use .mode of distribution instead of relying on knowledge of the distribution type. #1032

Breaking Changes

Removed .data attribute from Collector and its child classes. #1063
Collectors no longer reset the environment on initialization. Instead, the user might have to call reset expicitly or pass reset_before_collect=True . #1063
VectorEnvs now return an array of info-dicts on reset instead of a list. #1063
Fixed iter(Batch(...) which now behaves the same way as Batch(...).__iter__(). Can be considered a bugfix. #1063
Changed interface of dist_fn in PGPolicy and all subclasses to take a single argument in both continuous and discrete cases. #1032

Tests

Fixed env seeding it test_sac_with_il.py so that the test doesn't fail randomly. #1081

Started after v1.0.0

2.0 KiB Raw Blame History

Changelog

Release 1.1.0

Api Extensions

Internal Improvements

Breaking Changes

Tests

2.0 KiB

Raw Blame History