Tianshou

Author	SHA1	Message	Date
Dominik Jain	1903a72ecb	Improve logging	2023-12-14 19:31:30 +01:00
Dominik Jain	3caa3805f0	Fix: SamplingConfig.start_timesteps_random was not used	2023-12-14 11:47:32 +01:00
Michael Panchenko	0b67447541	Docs: fixing spelling, re-adding spellcheck to pipeline	2023-12-05 13:22:04 +01:00
Michael Panchenko	a846b52063	Typing: fixed multiple typing issues	2023-12-05 12:04:18 +01:00
Michael Panchenko	2e39a252e3	Docstring: minor changes to let ruff pass	2023-12-04 13:52:46 +01:00
Michael Panchenko	4cfefcf75d	Docs: removed conflicting sphinx stuff from a docstring	2023-12-04 11:48:09 +01:00
Michael Panchenko	a5685619ce	Docs: generate all api docs automatically Reinstate the -W option Several overall improvements in docs Fixed multiple links	2023-12-04 11:48:09 +01:00
Michael Panchenko	8d3d1f164b	Support batch_size=None and use it in various scripts (#993 ) Closes #986	2023-11-24 10:13:10 -08:00
Michael Panchenko	f134bc20b5	Bugfix/discrete bcq inf (#995 ) Fixes a small bug with using np.inf instead of torch-based infinity Closes #963 --------- Co-authored-by: ivan.rodriguez <ivan.rodriguez@unternehmertum.de>	2023-11-24 11:17:40 +01:00
Michael Panchenko	3a1bc18add	Method to compute actions from observations (#991 ) This PR adds a new method for getting actions from an env's observation and info. This is useful for standard inference and stands in contrast to batch-based methods that are currently used in training and evaluation. Without this, users have to do some kind of gymnastics to actually perform inference with a trained policy. I have also added a test for the new method. In future PRs, this method should be included in the examples (in the the "watch" section). To add this required improving multiple typing things and, importantly, _simplifying the signature of `forward` in many policies!_ This is a breaking change, but it will likely affect no users. The `input` parameter of forward was a rather hacky mechanism, I believe it is good that it's gone now. It will also help with #948 . The main functional change is the addition of `compute_action` to `BasePolicy`. Other minor changes: - improvements in typing - updated PR and Issue templates - Improved handling of `max_action_num` Closes #981	2023-11-16 17:27:53 +00:00
Dominik Jain	6d6c85e594	Fix an issue where policies built with LRSchedulerFactoryLinear were not picklable (#992 ) - [X] I have marked all applicable categories: + [X] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [X] I have reformatted the code using `make format` (required) - [X] I have checked the code using `make commit-checks` (required) - [ ] If applicable, I have mentioned the relevant/related issue(s) - [ ] If applicable, I have listed every items in this Pull Request below The cause was the use of a lambda function in the state of a generated object.	2023-11-14 10:23:18 -08:00
Dominik Jain	dae4000cd2	Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.	2023-11-08 19:11:39 +01:00
Dominik Jain	ac672f65d1	Add docstring for ActorFactoryTransientStorageDecorator	2023-11-06 17:18:10 +01:00
Dominik Jain	7e6d3d627e	Rename class ActorCriticModuleOpt -> ActorCriticOpt	2023-11-06 16:51:41 +01:00
Dominik Jain	5c8d57a2d2	Fix index error in call to _with_critic_factory_default Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2023-11-06 16:17:14 +01:00
Dominik Jain	fdb0eba93d	Depend on sensAI instead of copying its utils (logging, string)	2023-10-27 20:15:58 +02:00
Dominik Jain	5952993cfe	Add option to disable file logging	2023-10-27 18:59:43 +02:00
Dominik Jain	a3dbe90515	Allow to configure the policy persistence mode, adding a new mode which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object	2023-10-26 13:19:33 +02:00
Dominik Jain	86cca8ffc3	Add comment explaining use of _logFormat	2023-10-26 12:50:08 +02:00
Dominik Jain	d684dae6cd	Change default number of environments (train=#CPUs, test=1)	2023-10-26 12:50:08 +02:00
Dominik Jain	3cd6dcc307	BaseTrainer: Remove info on default values from docstrings	2023-10-26 10:55:03 +02:00
Dominik Jain	da2194eff6	Force kwargs in PolicyWrapperFactoryIntrinsicCuriosity init	2023-10-26 10:43:59 +02:00
Dominik Jain	96298eafd8	Add convenient construction mechanisms for Environments (based on factory function for a single environment)	2023-10-25 21:20:07 +02:00
Dominik Jain	b5a891557f	Revert to simplified environment factory, removing unnecessary config object (configuration shall be part of the factory instance)	2023-10-24 13:14:23 +02:00
Dominik Jain	f7f20649e3	ExperimentConfig: Improve docstrings, remove obsolete item 'render'	2023-10-20 17:34:27 +02:00
Dominik Jain	7437131d79	Fix tianshou.highlevel depending on jsonargparse (should be dev dependency only) by introducing a new place where jsonargparse can be configured: logging.run_cli, which is also slightly more convenient	2023-10-19 11:40:49 +02:00
Dominik Jain	6cbee188b8	Change interface of EnvFactory to ensure that configuration of number of environments in SamplingConfig is used (values are now passed to factory method) This is clearer and removes the need to pass otherwise unnecessary configuration to environment factories at construction	2023-10-19 11:37:20 +02:00
Dominik Jain	bbfad01a9f	Improve docstrings	2023-10-18 22:07:40 +02:00
Dominik Jain	cc6f0162ff	miniblock: Fix type annotation of linear_layer	2023-10-18 20:57:43 +02:00
Dominik Jain	9c5ee55644	Merge remote-tracking branch 'origin/master' into feat/high-level-api Conflicts: poetry.lock	2023-10-18 20:44:45 +02:00
Dominik Jain	41bd463a7b	Allow to configure activation function in default networks * Set ReLU as default in all actor and critic factories * Configure non-default in applicable MuJoCo examples	2023-10-18 20:44:18 +02:00
Dominik Jain	80b1b1ff9d	World.restore_path: Add value check	2023-10-18 20:44:18 +02:00
Dominik Jain	c7d0cbb5d3	Experiment: Fix return type annotation, remove unused type arguments	2023-10-18 20:44:18 +02:00
Dominik Jain	ff451f8373	Add documentation to parameters, improve factorisation	2023-10-18 20:44:18 +02:00
Dominik Jain	e63d8d4147	Use ToStringMixin in dataclasses to detect recurring objects in larger object trees	2023-10-18 20:44:18 +02:00
Dominik Jain	ae4850692f	DQNExperimentBuilder: Use IntermediateModuleFactory instead of ActorFactory (similar to IQN implementation)	2023-10-18 20:44:18 +02:00
Dominik Jain	4b270eaa2d	Add documentation, improve structure of 'module' package	2023-10-18 20:44:18 +02:00
Dominik Jain	97e21b5ddf	Remove obsolete mixin, improve class names	2023-10-18 20:44:18 +02:00
Dominik Jain	90eaacb606	PolicyWrapperFactory: Remove unnecessary input type variable	2023-10-18 20:44:18 +02:00
Dominik Jain	fc695a5394	Use logging to report trainer epoch status	2023-10-18 20:44:18 +02:00
Dominik Jain	3bba192633	Add experiment result	2023-10-18 20:44:18 +02:00
Dominik Jain	023b33c917	Make mypy happy	2023-10-18 20:44:18 +02:00
Dominik Jain	76e870207d	Improve persistence handling * Add persistence/restoration of Experiment instance * Add file logging in experiment * Allow all persistence/logging to be disabled * Disable persistence in tests	2023-10-18 20:44:18 +02:00
Dominik Jain	ba803296cc	Add FileLoggerContext	2023-10-18 20:44:17 +02:00
Dominik Jain	3691ed2abc	Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances	2023-10-18 20:44:17 +02:00
Dominik Jain	f6d49774a2	Reify policy persistence, introducing Wold representation	2023-10-18 20:44:17 +02:00
Dominik Jain	686fd555b0	Extend tests, fixing some default behaviour	2023-10-18 20:44:17 +02:00
Dominik Jain	a8a367c42d	Support IQN in high-level API * Add example atari_iqn_hl * Factor out trainer callbacks to new module atari_callbacks * Extract base class for DQN-based agent factories * Improved module factory interface design, achieving higher generality	2023-10-18 20:44:17 +02:00
Dominik Jain	213e08a846	Add method get_output_dim to BaseActor	2023-10-18 20:44:17 +02:00
Dominik Jain	c7d0b6b4b2	Simplify agent factories by making better use of base classes	2023-10-18 20:44:17 +02:00

1 2 3 4 5 ...

404 Commits