88 Commits

Author SHA1 Message Date
Dominik Jain
20074931d5 Improve docstrings 2024-01-16 14:52:31 +01:00
Dominik Jain
05a8cf4e74 Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered 2024-01-16 14:52:31 +01:00
Dominik Jain
c9cb41bf55 Make envpool usage configuration more explicit 2024-01-16 14:52:31 +01:00
Dominik Jain
1e5ebc2a2d Improve naming of callback classes and related methods/attributes
Add EpochStopCallbackRewardThreshold
2024-01-12 17:13:42 +01:00
Dominik Jain
ff398beed9 Move callbacks for setting DQN epsilon values to the library 2024-01-12 17:13:42 +01:00
Dominik Jain
eaab7b0a4b Improve environment factory abstractions in high-level API:
* EnvFactory now uses the creation of a single environment as
   the basic functionality which the more high-level functions build
   upon
 * Introduce enum EnvMode to indicate the purpose for which an env
   is created, allowing the factory creation process to change its
   behaviour accordingly
 * Add EnvFactoryGymnasium to provide direct support for envs that
   can be created via gymnasium.make
     - EnvPool is supported via an injectible EnvPoolFactory
     - Existing EnvFactory implementations are now derived from
       EnvFactoryGymnasium
 * Use a separate environment (which uses new EnvMode.WATCH) for
   watching agent performance after training (instead of using test
   environments, which the user may want to configure differently)
2024-01-12 17:13:42 +01:00
Dominik Jain
d4e4f4ff63 Experiment builders for DQN and IQN:
* Fix: Disable softmax in default models
  * Add method with_model_factory_default (for DQN)
2024-01-10 15:42:18 +01:00
Dominik Jain
f77d95da04 Fix: Missing type annotation of Experiment.watch_num_episodes 2024-01-08 18:00:37 +01:00
Dominik Jain
97a241a6fc Fix: DiscreteEnvironments.from_factory used incorrect EnvType 2024-01-08 15:58:41 +01:00
maxhuettenrauch
522f7fbf98
Feature/dataclasses (#996)
This PR adds strict typing to the output of `update` and `learn` in all
policies. This will likely be the last large refactoring PR before the
next release (0.6.0, not 1.0.0), so it requires some attention. Several
difficulties were encountered on the path to that goal:

1. The policy hierarchy is actually "broken" in the sense that the keys
of dicts that were output by `learn` did not follow the same enhancement
(inheritance) pattern as the policies. This is a real problem and should
be addressed in the near future. Generally, several aspects of the
policy design and hierarchy might deserve a dedicated discussion.
2. Each policy needs to be generic in the stats return type, because one
might want to extend it at some point and then also extend the stats.
Even within the source code base this pattern is necessary in many
places.
3. The interaction between learn and update is a bit quirky, we
currently handle it by having update modify special field inside
TrainingStats, whereas all other fields are handled by learn.
4. The IQM module is a policy wrapper and required a
TrainingStatsWrapper. The latter relies on a bunch of black magic.

They were addressed by:
1. Live with the broken hierarchy, which is now made visible by bounds
in generics. We use type: ignore where appropriate.
2. Make all policies generic with bounds following the policy
inheritance hierarchy (which is incorrect, see above). We experimented a
bit with nested TrainingStats classes, but that seemed to add more
complexity and be harder to understand. Unfortunately, mypy thinks that
the code below is wrong, wherefore we have to add `type: ignore` to the
return of each `learn`

```python

T = TypeVar("T", bound=int)


def f() -> T:
  return 3
```

3. See above
4. Write representative tests for the `TrainingStatsWrapper`. Still, the
black magic might cause nasty surprises down the line (I am not proud of
it)...

Closes #933

---------

Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2023-12-30 11:09:03 +01:00
Dominik Jain
e8cc80f990 Environments: Add option to a use a different factory for test envs
to `from_factory` convenience construction mechanisms
2023-12-21 13:13:51 +01:00
Dominik Jain
45a1a3f259 SamplingConfig: Change default of repeat_per_collect to 1 (safest option) 2023-12-21 13:13:51 +01:00
Dominik Jain
408d51f9de SamplingConfig: Improve/extend docstrings, clearly explaining the parameters 2023-12-21 13:13:51 +01:00
Dominik Jain
1903a72ecb Improve logging 2023-12-14 19:31:30 +01:00
Dominik Jain
3caa3805f0 Fix: SamplingConfig.start_timesteps_random was not used 2023-12-14 11:47:32 +01:00
Michael Panchenko
a846b52063 Typing: fixed multiple typing issues 2023-12-05 12:04:18 +01:00
Michael Panchenko
2e39a252e3 Docstring: minor changes to let ruff pass 2023-12-04 13:52:46 +01:00
Dominik Jain
6d6c85e594
Fix an issue where policies built with LRSchedulerFactoryLinear were not picklable (#992)
- [X] I have marked all applicable categories:
    + [X] exception-raising fix
    + [ ] algorithm implementation fix
    + [ ] documentation modification
    + [ ] new feature
- [X] I have reformatted the code using `make format` (**required**)
- [X] I have checked the code using `make commit-checks` (**required**)
- [ ] If applicable, I have mentioned the relevant/related issue(s)
- [ ] If applicable, I have listed every items in this Pull Request
below

The cause was the use of a lambda function in the state of a generated
object.
2023-11-14 10:23:18 -08:00
Dominik Jain
dae4000cd2 Revert "Depend on sensAI instead of copying its utils (logging, string)"
This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da.
2023-11-08 19:11:39 +01:00
Dominik Jain
ac672f65d1 Add docstring for ActorFactoryTransientStorageDecorator 2023-11-06 17:18:10 +01:00
Dominik Jain
7e6d3d627e Rename class ActorCriticModuleOpt -> ActorCriticOpt 2023-11-06 16:51:41 +01:00
Dominik Jain
5c8d57a2d2
Fix index error in call to _with_critic_factory_default
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2023-11-06 16:17:14 +01:00
Dominik Jain
fdb0eba93d Depend on sensAI instead of copying its utils (logging, string) 2023-10-27 20:15:58 +02:00
Dominik Jain
5952993cfe Add option to disable file logging 2023-10-27 18:59:43 +02:00
Dominik Jain
a3dbe90515 Allow to configure the policy persistence mode, adding a new mode
which stores the entire policy (new default), supporting applications
where it is desired to be bale to load the policy without having
to instantiate an environment or recreate a corresponding policy
object
2023-10-26 13:19:33 +02:00
Dominik Jain
d684dae6cd Change default number of environments (train=#CPUs, test=1) 2023-10-26 12:50:08 +02:00
Dominik Jain
da2194eff6 Force kwargs in PolicyWrapperFactoryIntrinsicCuriosity init 2023-10-26 10:43:59 +02:00
Dominik Jain
96298eafd8 Add convenient construction mechanisms for Environments
(based on factory function for a single environment)
2023-10-25 21:20:07 +02:00
Dominik Jain
b5a891557f Revert to simplified environment factory, removing unnecessary config object
(configuration shall be part of the factory instance)
2023-10-24 13:14:23 +02:00
Dominik Jain
f7f20649e3 ExperimentConfig: Improve docstrings, remove obsolete item 'render' 2023-10-20 17:34:27 +02:00
Dominik Jain
7437131d79 Fix tianshou.highlevel depending on jsonargparse
(should be dev dependency only) by introducing a new
place where jsonargparse can be configured:
logging.run_cli, which is also slightly more convenient
2023-10-19 11:40:49 +02:00
Dominik Jain
6cbee188b8 Change interface of EnvFactory to ensure that configuration
of number of environments in SamplingConfig is used
(values are now passed to factory method)

This is clearer and removes the need to pass otherwise
unnecessary configuration to environment factories at
construction
2023-10-19 11:37:20 +02:00
Dominik Jain
bbfad01a9f Improve docstrings 2023-10-18 22:07:40 +02:00
Dominik Jain
41bd463a7b Allow to configure activation function in default networks
* Set ReLU as default in all actor and critic factories
* Configure non-default in applicable MuJoCo examples
2023-10-18 20:44:18 +02:00
Dominik Jain
80b1b1ff9d World.restore_path: Add value check 2023-10-18 20:44:18 +02:00
Dominik Jain
c7d0cbb5d3 Experiment: Fix return type annotation, remove unused type arguments 2023-10-18 20:44:18 +02:00
Dominik Jain
ff451f8373 Add documentation to parameters, improve factorisation 2023-10-18 20:44:18 +02:00
Dominik Jain
e63d8d4147 Use ToStringMixin in dataclasses to detect recurring objects in larger object trees 2023-10-18 20:44:18 +02:00
Dominik Jain
ae4850692f DQNExperimentBuilder: Use IntermediateModuleFactory instead of ActorFactory
(similar to IQN implementation)
2023-10-18 20:44:18 +02:00
Dominik Jain
4b270eaa2d Add documentation, improve structure of 'module' package 2023-10-18 20:44:18 +02:00
Dominik Jain
97e21b5ddf Remove obsolete mixin, improve class names 2023-10-18 20:44:18 +02:00
Dominik Jain
90eaacb606 PolicyWrapperFactory: Remove unnecessary input type variable 2023-10-18 20:44:18 +02:00
Dominik Jain
fc695a5394 Use logging to report trainer epoch status 2023-10-18 20:44:18 +02:00
Dominik Jain
3bba192633 Add experiment result 2023-10-18 20:44:18 +02:00
Dominik Jain
023b33c917 Make mypy happy 2023-10-18 20:44:18 +02:00
Dominik Jain
76e870207d Improve persistence handling
* Add persistence/restoration of Experiment instance
* Add file logging in experiment
* Allow all persistence/logging to be disabled
* Disable persistence in tests
2023-10-18 20:44:18 +02:00
Dominik Jain
3691ed2abc Support obs_rms persistence for MuJoCo by adding a general mechanism
for attaching persistence to Environments instances
2023-10-18 20:44:17 +02:00
Dominik Jain
f6d49774a2 Reify policy persistence, introducing Wold representation 2023-10-18 20:44:17 +02:00
Dominik Jain
686fd555b0 Extend tests, fixing some default behaviour 2023-10-18 20:44:17 +02:00
Dominik Jain
a8a367c42d Support IQN in high-level API
* Add example atari_iqn_hl
* Factor out trainer callbacks to new module atari_callbacks
* Extract base class for DQN-based agent factories
* Improved module factory interface design, achieving higher generality
2023-10-18 20:44:17 +02:00