Tianshou

Author	SHA1	Message	Date
Michael Panchenko	26b867e442	Adjust locations of setting the policy in train/eval mode (#1123 ) Addresses #1122: * We Introduced a new flag `is_within_training_step` which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their `deterministic_eval` setting should indeed apply instead of the torch training flag (which was abused!). * The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed. * The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect. Further, it never makes sense to compute gradients during collection, so the possibility to pass `no_grad=False` was removed. Further changes: - Base class for collectors: `BaseCollector` - New util context managers `in_eval_mode` and `in_train_mode` for torch modules. - `reset` of `Collectors` now returns `obs` and `info`. - `no-grad` no longer accepted as kwarg of `collect` - Removed deprecations of `0.5.1` (will likely not affect anyone) and the unused `warnings` module.	2024-05-06 20:38:19 +02:00
Michael Panchenko	e94a5c04cf	New context manager: policy_within_training_step Adjusted notebooks, log messages and docs accordingly. Removed now obsolete in_eval_mode and the private context manager in Trainer	2024-05-06 19:22:58 +02:00
Michael Panchenko	78ea013956	Tests: fixed test_psrl.py: use args.reward_threshold instead of spec For some reason now env.spec.reward_treshold is None - some change in upstream code Also added better pytest skip message	2024-05-06 16:16:20 +02:00
Michael Panchenko	6a5b3c837a	Docstrings, skip hidden files in autogen_rst	2024-05-05 23:31:20 +02:00
Michael Panchenko	f059b65103	Merge branch 'refs/heads/thuml-master' into policy-train-eval # Conflicts: # CHANGELOG.md	2024-05-05 22:33:51 +02:00
Michael Panchenko	d8e5631567	Extended changelog, slightly improved structure	2024-05-05 22:28:57 +02:00
Michael Panchenko	2abb4dac24	Reinstated warning module	2024-05-05 22:27:19 +02:00
Dominik Jain	024b80e79c	Improve creation of multiple seeded experiments: * Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)	2024-05-05 22:27:19 +02:00
Dominik Jain	35779696ee	Clean up handling of an Experiment's name (and, by extension, a run's name)	2024-05-05 22:27:19 +02:00
Michael Panchenko	a8e9df31f7	Bugfix: allow for training_stat to be None instead of asserting not-None	2024-05-05 22:27:19 +02:00
Michael Panchenko	9fbf28ef6e	Improvements pertaining to the handling of multi-experiment creation (#1131 ) Description of changes: see individual commits; merged without squashing. Co-authored by: @maxhuettenrauch Partly addressed #1129	2024-05-05 21:41:53 +02:00
Michael Panchenko	0a7fd1ee8e	Merge branch 'master' into feature/multi-experiment	2024-05-05 16:21:26 +02:00
Michael Panchenko	4e38aeb829	Merge branch 'refs/heads/thuml-master' into policy-train-eval # Conflicts: # CHANGELOG.md	2024-05-05 16:03:34 +02:00
Michael Panchenko	82f425e9fe	Collector: move @override, removed docstrings from overridden methods	2024-05-05 16:01:52 +02:00
Michael Panchenko	26a6cca76e	Improved docstrings, added asserts to make mypy happy	2024-05-05 15:56:06 +02:00
Michael Panchenko	c5d0e169b5	Collector: removed unnecessary no-grad flag from interfaces. Breaking	2024-05-05 15:41:20 +02:00
Michael Panchenko	f876198870	Formatting	2024-05-05 15:16:53 +02:00
Michael Panchenko	6927eadaa7	BatchPolicy: check that `self.is_within_training_step` is True on update	2024-05-05 15:14:59 +02:00
dependabot[bot]	2f2d5cb210	Bump tqdm from 4.66.1 to 4.66.3 (#1134 )	2024-05-05 15:01:46 +02:00
Dominik Jain	c35be8d07e	Establish backward compatibility by implementing __setstate__	2024-05-03 15:18:39 +02:00
Dominik Jain	ca69e79b4a	Change the way in which deterministic evaluation is controlled: * Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer	2024-05-03 15:18:39 +02:00
Dominik Jain	18f236167f	Fix invalid kwarg	2024-05-03 10:12:41 +02:00
Dominik Jain	ca4dad1139	BaseTrainer: Refactoring New method training_step, which * collects training data (method _collect_training_data) * performs "test in train" (method _test_in_train) * performs policy update The old method named train_step performed only the first two points and was now split into two separate methods	2024-05-03 10:12:35 +02:00
Dominik Jain	4f16494609	Set torch train mode in BasePolicy.update instead of in each .learn implementation, as this is less prone to errors	2024-05-02 11:51:08 +02:00
bordeauxred	f31a91df5d	Typo docstring (#1132 )	2024-05-01 08:59:00 +02:00
Michael Panchenko	606128f29a	Merge branch 'master' into feature/multi-experiment	2024-04-30 22:52:45 +02:00
Dominik Jain	393e55aa58	Improve change log #1129	2024-04-30 17:47:06 +02:00
Dominik Jain	ea0c4f1a30	Update change log with changes from #1131	2024-04-30 17:31:48 +02:00
Dominik Jain	f8cca8b07c	Improve creation of multiple seeded experiments: * Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)	2024-04-30 17:22:11 +02:00
Dominik Jain	2b1594a1c8	Clean up handling of an Experiment's name (and, by extension, a run's name)	2024-04-30 16:24:46 +02:00
bordeauxred	61426acf07	Improve the documentation of compute_episodic_return in base policy. (#1130 )	2024-04-30 14:40:16 +02:00
Michael Panchenko	a65920fc68	Support Actor preprocessing network reuse for continuous case, fixes in DQN network (#1128 ) This PR fixes a bug in DQN and lifts a limination in reusing the actor's preprocessing network for continuous environments. * `atari_network.DQN`: * Fix input validation * Fix output_dim not being set if features_only=True and output_dim_added_layer not None * `continuous.Critic`: * Add flag `apply_preprocess_net_to_obs_only` to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network * CriticFactoryReuseActor: Use the flag, fixing the case where we want to reuse an actor's preprocessing network for the critic (must be applied before concatenating the actions) * Minor improvements in docs/docstrings	2024-04-29 23:49:52 +02:00
Dominik Jain	40f772493e	Update change log with changes from #1128	2024-04-29 22:30:54 +02:00
Dominik Jain	83083924df	Mention CHANGELOG.md in PR template	2024-04-29 22:14:36 +02:00
Dominik Jain	8ac6bf5fbb	Improve docstrings	2024-04-29 18:27:02 +02:00
Dominik Jain	250a129cc4	SamplingConfig: Improve docstrings of replay_buffer_save_only_last_obs, replay_buffer_stack_num	2024-04-29 18:27:02 +02:00
Dominik Jain	74737416ff	Fix typo	2024-04-29 18:27:02 +02:00
Dominik Jain	d18ded333e	CriticFactoryReuseActor: Fix the case where we want to reuse an actor's preprocessing network for the critic (must be applied before concatenating the actions)	2024-04-29 18:27:02 +02:00
Dominik Jain	0b494845c9	continuous.Critic: Add flag apply_preprocess_net_to_obs_only to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network	2024-04-29 18:27:02 +02:00
Dominik Jain	18ed981875	Add pickle/serialisation utils: setstate and getstate	2024-04-29 18:27:02 +02:00
Dominik Jain	be1c8cd235	DQN: * Fix input validation * Fix output_dim not being set if features_only=True and output_dim_added_layer not None	2024-04-29 13:37:26 +02:00
Michael Panchenko	a2b9d7c7d8	Changelog [skip-ci]	2024-04-26 18:31:02 +02:00
Michael Panchenko	45922712d9	Dosctring add return [skip-ci]	2024-04-26 18:14:20 +02:00
Michael Panchenko	e2e8a699ea	Changelog [skip-ci]	2024-04-26 18:11:23 +02:00
Michael Panchenko	6aa33b1bfe	Formatting	2024-04-26 17:54:14 +02:00
Michael Panchenko	c28508b3be	Changelog	2024-04-26 17:53:34 +02:00
Michael Panchenko	2eaf1f37c2	Use the new BaseCollector interface for annotations	2024-04-26 17:53:27 +02:00
Michael Panchenko	07a97c7d93	Merge branch 'refs/heads/thuml-master' into policy-train-eval	2024-04-26 17:44:57 +02:00
Michael Panchenko	69f07a8f12	Tests: fixed typing issues by declaring union types and no longer reusing var names	2024-04-26 17:39:31 +02:00
Michael Panchenko	4b619c51ba	Collector: extracted interface BaseCollector, minor simplifications Renamed is_eval kwarg	2024-04-26 17:39:31 +02:00

1 2 3 4 5 ...

721 Commits