509 Commits

Author SHA1 Message Date
Dominik Jain
76e870207d Improve persistence handling
* Add persistence/restoration of Experiment instance
* Add file logging in experiment
* Allow all persistence/logging to be disabled
* Disable persistence in tests
2023-10-18 20:44:18 +02:00
Dominik Jain
ba803296cc Add FileLoggerContext 2023-10-18 20:44:17 +02:00
Dominik Jain
3691ed2abc Support obs_rms persistence for MuJoCo by adding a general mechanism
for attaching persistence to Environments instances
2023-10-18 20:44:17 +02:00
Dominik Jain
f6d49774a2 Reify policy persistence, introducing Wold representation 2023-10-18 20:44:17 +02:00
Dominik Jain
ee3813b09c Ignore temp scripts and temp folder 2023-10-18 20:44:17 +02:00
Dominik Jain
686fd555b0 Extend tests, fixing some default behaviour 2023-10-18 20:44:17 +02:00
Dominik Jain
a8a367c42d Support IQN in high-level API
* Add example atari_iqn_hl
* Factor out trainer callbacks to new module atari_callbacks
* Extract base class for DQN-based agent factories
* Improved module factory interface design, achieving higher generality
2023-10-18 20:44:17 +02:00
Dominik Jain
213e08a846 Add method get_output_dim to BaseActor 2023-10-18 20:44:17 +02:00
Dominik Jain
c7d0b6b4b2 Simplify agent factories by making better use of base classes 2023-10-18 20:44:17 +02:00
Dominik Jain
799beb79b4 Support discrete SAC in high-level API
* Changed machanism for reusing actor's preprocessing module in critics
  to avoid special handling in AgentFactory implementations, improving
  separation of concerns:
    - Added CriticFactoryReuseActor as the new critic factory
    - Added ActorFactoryTransientStorageDecorator to pass on the actor
      data
    - Added helper classes ActorFuture, ActorFutureProviderProtocol
* Add example atari_sac_hl
2023-10-18 20:44:17 +02:00
Dominik Jain
305b30a6c1 Simplify parameter transformers by applying ParamTransformerChangeValue 2023-10-18 20:44:17 +02:00
Dominik Jain
17ef4dd5eb Support REDQ in high-level API
* Implement example mujoco_redq_hl
* Add abstraction CriticEnsembleFactory with default implementations
  to suit REDQ
* Fix type annotation of linear_layer in Net, MLP, Critic
  (was incompatible with REDQ usage)
2023-10-18 20:44:17 +02:00
Dominik Jain
7af836bd6a Support TRPO in high-level API and add example mujoco_trpo_hl 2023-10-18 20:44:17 +02:00
Dominik Jain
383a4a6083 Support NPG in high-level API and add example mujoco_npg_hl 2023-10-18 20:44:17 +02:00
Dominik Jain
73a6d15eee Log Environments 2023-10-18 20:44:17 +02:00
Dominik Jain
a8ea6808c3 Fix ruff type comparison complaint 2023-10-18 20:44:17 +02:00
Dominik Jain
1bb52a6a5c Simplify critic/agent with optimizer generation
After adding a function to create ModuleOpt instances directly from
AgentFactory and CriticFactory,
  * several mixins for AgentFactories are no longer needed (deleted)
  * additional abstractions for ModuleOptFactories are no longer needed (deleted)
2023-10-18 20:44:17 +02:00
Dominik Jain
6bb3abb2f0 Support PG/Reinforce in high-level API
* Add example mujoco_reinforce_hl
* Extended functionality of ActorFactory to support creation of ModuleOpt
2023-10-18 20:44:17 +02:00
Dominik Jain
4e93c12afa Remove obsolete configuration files 2023-10-18 20:44:17 +02:00
Dominik Jain
22dfc4ed2e Fix type annotations of dist_fn 2023-10-18 20:44:17 +02:00
Dominik Jain
a161a9cf58 Improve type annotations, fix type issues and add checks 2023-10-18 20:44:17 +02:00
Dominik Jain
e6716326bd Make mypy ignore copied util modules string & logging 2023-10-18 20:44:17 +02:00
Dominik Jain
7ed6c1d71c Remove obsolete module highlevel.utils 2023-10-18 20:44:17 +02:00
Dominik Jain
1243894eb8 Add DistributionFunctionFactory subclasses for discrete/continuous default 2023-10-18 20:44:17 +02:00
Dominik Jain
a8dc75fbab ExperimentBuilder: Allow experiment_config and sampling_config to be None 2023-10-18 20:44:17 +02:00
Dominik Jain
837ff13c04 Reorder ExperimentBuilder args (EnvFactory first) 2023-10-18 20:44:17 +02:00
Dominik Jain
d269063e6a Remove 'RL' prefix from class names 2023-10-18 20:44:17 +02:00
Dominik Jain
50ac385321 Add some basic tests for high-level experiment builder API 2023-10-18 20:44:16 +02:00
Dominik Jain
b54fcd12cb Change high-level DQN interface to expect an actor instead of a critic,
because that is what is functionally required
2023-10-18 20:44:16 +02:00
Dominik Jain
1cba589bd4 Add DQN support in high-level API
* Allow to specify trainer callbacks (train_fn, test_fn, stop_fn)
  in high-level API, adding the necessary abstractions and pass-on
  mechanisms
* Add example atari_dqn_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
358978c65d Add ToStringMixin to further high-level parameter classes 2023-10-18 20:44:16 +02:00
Dominik Jain
8f67c2e9d9 Disable numba DEBUG logs 2023-10-18 20:44:16 +02:00
Dominik Jain
9f0a410bb1 Log full experiment configuration, adding string representations to relevant classes 2023-10-18 20:44:16 +02:00
Dominik Jain
58bd20f882 Add logging module 2023-10-18 20:44:16 +02:00
Dominik Jain
ce26e25923 Handle ruff complaints in string module 2023-10-18 20:44:16 +02:00
Dominik Jain
de70147752 Add string module from sensAI 2023-10-18 20:44:16 +02:00
Dominik Jain
2671580c6c Add DDPG high-level API and MuJoCo example 2023-10-18 20:44:16 +02:00
Dominik Jain
6b6d9ea609 Add support for discrete PPO
* Refactored module `module` (split into submodules)
* Basic support for discrete environments
* Implement Atari env. factory
* Implement DQN-based actor factory
* Implement notion of reusing agent preprocessing network for critic
* Add example atari_ppo_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
e0e7349b0a Add base class BaseActor with method get_preprocess_net for high-level API 2023-10-18 20:44:16 +02:00
Dominik Jain
cd79cf8661 Add A2C high-level API
* Add common based class for A2C and PPO agent factories
* Add default for dist_fn parameter, adding corresponding factories
* Add example mujoco_a2c_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
acd89fa3b0 Remove parameter transformers from config object state,
composing the list dynamically instead
2023-10-18 20:44:16 +02:00
Dominik Jain
78b6dd1f49 Adapt class naming scheme
* Use prefix convention (subclasses have superclass names as prefix) to
  facilitate discoverability of relevant classes via IDE autocompletion
* Use dual naming, adding an alternative concise name that omits the
  precise OO semantics and retains only the essential part of the name
  (which can be more pleasing to users not accustomed to
  convoluted OO naming)
2023-10-18 20:44:16 +02:00
Michael Panchenko
5bcf514c55 Add alternative functional interface for environment creation
where a persistable configuration object is passed as an
argument, as this can help to ensure persistability (making the
requirement explicit)
2023-10-18 20:44:16 +02:00
Dominik Jain
d4e604b46e Move parameter transformation directly into parameter objects,
achieving greater separation of concerns and improved maintainability
2023-10-18 20:44:16 +02:00
Dominik Jain
38cf982034 Disable Ruff rule D205 (blank-line-after-summary)
because it disallows, in particular, class docstrings that consist
only of a summary line
2023-10-18 20:44:16 +02:00
Dominik Jain
e993425aa1 Add high-level API support for TD3
* Created mixins for agent factories to reduce code duplication
 * Further factorised params & mixins for experiment factories
 * Additional parameter abstractions
 * Implement high-level MuJoCo TD3 example
2023-10-18 20:44:16 +02:00
Dominik Jain
6a739384ee WandbLogger: Use less restrictive type annotation for config 2023-10-18 20:44:16 +02:00
Dominik Jain
367778d37f Improve high-level policy parametrisation
Policy objects are now parametrised by converting the parameter
dataclass instances to kwargs, using some injectable conversions
along the way
2023-10-18 20:44:16 +02:00
Dominik Jain
37dc07e487 Add high-level experiment builder interface 2023-10-18 20:44:05 +02:00
dependabot[bot]
4a51e69265
Bump urllib3 from 2.0.6 to 2.0.7 (#972)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.0.7</h2>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.
(GHSA-g4mx-q9vg-27p4)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.0.7 (2023-10-17)</h1>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="56f01e088d"><code>56f01e0</code></a>
Release 2.0.7</li>
<li><a
href="4e50fbc5db"><code>4e50fbc</code></a>
Merge pull request from GHSA-g4mx-q9vg-27p4</li>
<li><a
href="80808b04bf"><code>80808b0</code></a>
Fix docs build on Python 3.12 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3144">#3144</a>)</li>
<li><a
href="f28deff1cf"><code>f28deff</code></a>
Add 1.26.17 to the current changelog</li>
<li>See full diff in <a
href="https://github.com/urllib3/urllib3/compare/2.0.6...2.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.0.6&new-version=2.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 21:13:19 -04:00