Tianshou/CHANGELOG.md
Michael Panchenko 4e38aeb829 Merge branch 'refs/heads/thuml-master' into policy-train-eval
# Conflicts:
#	CHANGELOG.md
2024-05-05 16:03:34 +02:00

5.4 KiB

Changelog

Release 1.1.0

Api Extensions

  • Batch received two new methods: to_dict and to_list_of_dicts. #1063
  • Collectors can now be closed, and their reset is more granular. #1063
  • Trainers can control whether collectors should be reset prior to training. #1063
  • Convenience constructor for CollectStats called with_autogenerated_stats. #1063
  • SamplingConfig supports batch_size=None. #1077
  • Batch received new methods: to_numpy_ and to_torch_. #1098, #1117
  • to_dict in Batch supports also non-recursive conversion. #1098
  • Batch __eq__ implemented, semantic equality check of batches is now possible. #1098
  • Batch.keys() deprecated in favor of Batch.get_keys() (needed to make iteration consistent with naming) #1105.
  • Experiment and ExperimentConfig now have a name, that can however be overridden when Experiment.run() is called. #1074
  • When building an Experiment from an ExperimentConfig, the user has the option to add info about seeds to the name. #1074
  • New method in ExperimentConfig called build_default_seeded_experiments. #1074
  • SamplingConfig has an explicit training seed, test_seed is inferred. #1074
  • New evaluation package for repeating the same experiment with multiple seeds and aggregating the results (important extension!). Launchers for parallelization currently in alpha state. #1074
  • Loggers can now restore the logged data into python by using the new restore_logged_data method. #1074
  • continuous.Critic:
    • Add flag apply_preprocess_net_to_obs_only to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network #1128
  • Base class for collectors: BaseCollector #1122
  • Collectors can now explicitly specify whether to use the policy in training or evaluation mode. #1122
  • New util context managers in_eval_mode and in_train_mode for torch modules. #1122
  • reset of Collectors now returns obs and info. #1122

Fixes

  • CriticFactoryReuseActor: Enable the Critic flag apply_preprocess_net_to_obs_only for continuous critics, fixing the case where we want to reuse an actor's preprocessing network for the critic (affects usages of the experiment builder method with_critic_factory_use_actor with continuous environments) #1128
  • atari_network.DQN:
    • Fix constructor input validation #1128
    • Fix output_dim not being set if features_only=True and output_dim_added_layer is not None #1128

Internal Improvements

  • Collectors rely less on state, the few stateful things are stored explicitly instead of through a .data attribute. #1063
  • Introduced a first iteration of a naming convention for vars in Collectors. #1063
  • Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
  • Improved typing for exploration_noise and within Collector. #1063
  • Better variable names related to model outputs (logits, dist input etc.). #1032
  • Improved typing for actors and critics, using Tianshou classes like Actor, ActorProb, etc., instead of just nn.Module. #1032
  • Added interfaces for most Actor and Critic classes to enforce the presence of forward methods. #1032
  • Simplified PGPolicy forward by unifying the dist_fn interface (see associated breaking change). #1032
  • Use .mode of distribution instead of relying on knowledge of the distribution type. #1032
  • Exception no longer raised on len of empty Batch. #1084
  • tests and examples are covered by mypy. #1077
  • NetBase is more used, stricter typing by making it generic. #1077
  • Use explicit multiprocessing context for creating Pipe in subproc.py. #1102
  • Removed all if __name__ == "__main__": blocks from tests. #1122
  • Improved typing issues in tests with buffer and collector. #1122

Breaking Changes

  • Removed .data attribute from Collector and its child classes. #1063
  • Collectors no longer reset the environment on initialization. Instead, the user might have to call reset expicitly or pass reset_before_collect=True . #1063
  • VectorEnvs now return an array of info-dicts on reset instead of a list. #1063
  • Fixed iter(Batch(...) which now behaves the same way as Batch(...).__iter__(). Can be considered a bugfix. #1063
  • Changed interface of dist_fn in PGPolicy and all subclasses to take a single argument in both continuous and discrete cases. #1032
  • utils.net.common.Recurrent now receives and returns a RecurrentStateBatch instead of a dict. #1077
  • The methods to_numpy and to_torch in Batch is not in-place anymore (use to_numpy_ or to_torch_ instead). #1098, #1117
  • AtariEnvFactory constructor (in examples, so not really breaking) now requires explicit train and test seeds. #1074
  • EnvFactoryRegistered now requires an explicit test_seed in the constructor. #1074
  • BaseLogger.prepare_dict_for_logging is now abstract. #1074
  • Removed deprecated and unused BasicLogger (only affects users who subclassed it). #1074
  • Removed deprecations of 0.5.1 (will likely not affect anyone) and the unused warnings module. #1122

Tests

  • Fixed env seeding it test_sac_with_il.py so that the test doesn't fail randomly. #1081

Dependencies

  • DeepDiff added to help with diffs of batches in tests. #1098
  • Bumped black, idna, pillow
  • New extra "eval"

Started after v1.0.0