Tianshou

Author	SHA1	Message	Date
Dominik Jain	f77d95da04	Fix: Missing type annotation of Experiment.watch_num_episodes	2024-01-08 18:00:37 +01:00
Dominik Jain	97a241a6fc	Fix: DiscreteEnvironments.from_factory used incorrect EnvType	2024-01-08 15:58:41 +01:00
maxhuettenrauch	522f7fbf98	Feature/dataclasses (#996 ) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2023-12-30 11:09:03 +01:00
Dominik Jain	e8cc80f990	Environments: Add option to a use a different factory for test envs to `from_factory` convenience construction mechanisms	2023-12-21 13:13:51 +01:00
Dominik Jain	45a1a3f259	SamplingConfig: Change default of repeat_per_collect to 1 (safest option)	2023-12-21 13:13:51 +01:00
Dominik Jain	408d51f9de	SamplingConfig: Improve/extend docstrings, clearly explaining the parameters	2023-12-21 13:13:51 +01:00
Dominik Jain	1903a72ecb	Improve logging	2023-12-14 19:31:30 +01:00
Dominik Jain	3caa3805f0	Fix: SamplingConfig.start_timesteps_random was not used	2023-12-14 11:47:32 +01:00
Michael Panchenko	a846b52063	Typing: fixed multiple typing issues	2023-12-05 12:04:18 +01:00
Michael Panchenko	2e39a252e3	Docstring: minor changes to let ruff pass	2023-12-04 13:52:46 +01:00
Dominik Jain	6d6c85e594	Fix an issue where policies built with LRSchedulerFactoryLinear were not picklable (#992 ) - [X] I have marked all applicable categories: + [X] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [X] I have reformatted the code using `make format` (required) - [X] I have checked the code using `make commit-checks` (required) - [ ] If applicable, I have mentioned the relevant/related issue(s) - [ ] If applicable, I have listed every items in this Pull Request below The cause was the use of a lambda function in the state of a generated object.	2023-11-14 10:23:18 -08:00
Dominik Jain	dae4000cd2	Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.	2023-11-08 19:11:39 +01:00
Dominik Jain	ac672f65d1	Add docstring for ActorFactoryTransientStorageDecorator	2023-11-06 17:18:10 +01:00
Dominik Jain	7e6d3d627e	Rename class ActorCriticModuleOpt -> ActorCriticOpt	2023-11-06 16:51:41 +01:00
Dominik Jain	5c8d57a2d2	Fix index error in call to _with_critic_factory_default Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2023-11-06 16:17:14 +01:00
Dominik Jain	fdb0eba93d	Depend on sensAI instead of copying its utils (logging, string)	2023-10-27 20:15:58 +02:00
Dominik Jain	5952993cfe	Add option to disable file logging	2023-10-27 18:59:43 +02:00
Dominik Jain	a3dbe90515	Allow to configure the policy persistence mode, adding a new mode which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object	2023-10-26 13:19:33 +02:00
Dominik Jain	d684dae6cd	Change default number of environments (train=#CPUs, test=1)	2023-10-26 12:50:08 +02:00
Dominik Jain	da2194eff6	Force kwargs in PolicyWrapperFactoryIntrinsicCuriosity init	2023-10-26 10:43:59 +02:00
Dominik Jain	96298eafd8	Add convenient construction mechanisms for Environments (based on factory function for a single environment)	2023-10-25 21:20:07 +02:00
Dominik Jain	b5a891557f	Revert to simplified environment factory, removing unnecessary config object (configuration shall be part of the factory instance)	2023-10-24 13:14:23 +02:00
Dominik Jain	f7f20649e3	ExperimentConfig: Improve docstrings, remove obsolete item 'render'	2023-10-20 17:34:27 +02:00
Dominik Jain	7437131d79	Fix tianshou.highlevel depending on jsonargparse (should be dev dependency only) by introducing a new place where jsonargparse can be configured: logging.run_cli, which is also slightly more convenient	2023-10-19 11:40:49 +02:00
Dominik Jain	6cbee188b8	Change interface of EnvFactory to ensure that configuration of number of environments in SamplingConfig is used (values are now passed to factory method) This is clearer and removes the need to pass otherwise unnecessary configuration to environment factories at construction	2023-10-19 11:37:20 +02:00
Dominik Jain	bbfad01a9f	Improve docstrings	2023-10-18 22:07:40 +02:00
Dominik Jain	41bd463a7b	Allow to configure activation function in default networks * Set ReLU as default in all actor and critic factories * Configure non-default in applicable MuJoCo examples	2023-10-18 20:44:18 +02:00
Dominik Jain	80b1b1ff9d	World.restore_path: Add value check	2023-10-18 20:44:18 +02:00
Dominik Jain	c7d0cbb5d3	Experiment: Fix return type annotation, remove unused type arguments	2023-10-18 20:44:18 +02:00
Dominik Jain	ff451f8373	Add documentation to parameters, improve factorisation	2023-10-18 20:44:18 +02:00
Dominik Jain	e63d8d4147	Use ToStringMixin in dataclasses to detect recurring objects in larger object trees	2023-10-18 20:44:18 +02:00
Dominik Jain	ae4850692f	DQNExperimentBuilder: Use IntermediateModuleFactory instead of ActorFactory (similar to IQN implementation)	2023-10-18 20:44:18 +02:00
Dominik Jain	4b270eaa2d	Add documentation, improve structure of 'module' package	2023-10-18 20:44:18 +02:00
Dominik Jain	97e21b5ddf	Remove obsolete mixin, improve class names	2023-10-18 20:44:18 +02:00
Dominik Jain	90eaacb606	PolicyWrapperFactory: Remove unnecessary input type variable	2023-10-18 20:44:18 +02:00
Dominik Jain	fc695a5394	Use logging to report trainer epoch status	2023-10-18 20:44:18 +02:00
Dominik Jain	3bba192633	Add experiment result	2023-10-18 20:44:18 +02:00
Dominik Jain	023b33c917	Make mypy happy	2023-10-18 20:44:18 +02:00
Dominik Jain	76e870207d	Improve persistence handling * Add persistence/restoration of Experiment instance * Add file logging in experiment * Allow all persistence/logging to be disabled * Disable persistence in tests	2023-10-18 20:44:18 +02:00
Dominik Jain	3691ed2abc	Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances	2023-10-18 20:44:17 +02:00
Dominik Jain	f6d49774a2	Reify policy persistence, introducing Wold representation	2023-10-18 20:44:17 +02:00
Dominik Jain	686fd555b0	Extend tests, fixing some default behaviour	2023-10-18 20:44:17 +02:00
Dominik Jain	a8a367c42d	Support IQN in high-level API * Add example atari_iqn_hl * Factor out trainer callbacks to new module atari_callbacks * Extract base class for DQN-based agent factories * Improved module factory interface design, achieving higher generality	2023-10-18 20:44:17 +02:00
Dominik Jain	c7d0b6b4b2	Simplify agent factories by making better use of base classes	2023-10-18 20:44:17 +02:00
Dominik Jain	799beb79b4	Support discrete SAC in high-level API * Changed machanism for reusing actor's preprocessing module in critics to avoid special handling in AgentFactory implementations, improving separation of concerns: - Added CriticFactoryReuseActor as the new critic factory - Added ActorFactoryTransientStorageDecorator to pass on the actor data - Added helper classes ActorFuture, ActorFutureProviderProtocol * Add example atari_sac_hl	2023-10-18 20:44:17 +02:00
Dominik Jain	305b30a6c1	Simplify parameter transformers by applying ParamTransformerChangeValue	2023-10-18 20:44:17 +02:00
Dominik Jain	17ef4dd5eb	Support REDQ in high-level API * Implement example mujoco_redq_hl * Add abstraction CriticEnsembleFactory with default implementations to suit REDQ * Fix type annotation of linear_layer in Net, MLP, Critic (was incompatible with REDQ usage)	2023-10-18 20:44:17 +02:00
Dominik Jain	7af836bd6a	Support TRPO in high-level API and add example mujoco_trpo_hl	2023-10-18 20:44:17 +02:00
Dominik Jain	383a4a6083	Support NPG in high-level API and add example mujoco_npg_hl	2023-10-18 20:44:17 +02:00
Dominik Jain	73a6d15eee	Log Environments	2023-10-18 20:44:17 +02:00

1 2

81 Commits