* Add class ExperimentCollection to improve usability
* Remove parameters from ExperimentBuilder.build
* Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
changing the return type to ExperimentCollection
* Replace temp_config_mutation (which was not appropriate for the public API) with
method copy (which performs a safe deep copy)
* Remove flag `eval_mode` from Collector.collect
* Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages)
and set it appropriately in BaseTrainer
New method training_step, which
* collects training data (method _collect_training_data)
* performs "test in train" (method _test_in_train)
* performs policy update
The old method named train_step performed only the first two points
and was now split into two separate methods
* Add class ExperimentCollection to improve usability
* Remove parameters from ExperimentBuilder.build
* Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
changing the return type to ExperimentCollection
* Replace temp_config_mutation (which was not appropriate for the public API) with
method copy (which performs a safe deep copy)
This PR fixes a bug in DQN and lifts a limination in reusing the actor's
preprocessing network for continuous environments.
* `atari_network.DQN`:
* Fix input validation
* Fix output_dim not being set if features_only=True and
output_dim_added_layer not None
* `continuous.Critic`:
* Add flag `apply_preprocess_net_to_obs_only` to allow the
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
to reuse the actor's preprocessing network
* CriticFactoryReuseActor: Use the flag, fixing the case where we want
to reuse an actor's
preprocessing network for the critic (must be applied before
concatenating
the actions)
* Minor improvements in docs/docstrings
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
to reuse the actor's preprocessing network