492 Commits

Author SHA1 Message Date
Dominik Jain
6bb3abb2f0 Support PG/Reinforce in high-level API
* Add example mujoco_reinforce_hl
* Extended functionality of ActorFactory to support creation of ModuleOpt
2023-10-18 20:44:17 +02:00
Dominik Jain
4e93c12afa Remove obsolete configuration files 2023-10-18 20:44:17 +02:00
Dominik Jain
22dfc4ed2e Fix type annotations of dist_fn 2023-10-18 20:44:17 +02:00
Dominik Jain
a161a9cf58 Improve type annotations, fix type issues and add checks 2023-10-18 20:44:17 +02:00
Dominik Jain
e6716326bd Make mypy ignore copied util modules string & logging 2023-10-18 20:44:17 +02:00
Dominik Jain
7ed6c1d71c Remove obsolete module highlevel.utils 2023-10-18 20:44:17 +02:00
Dominik Jain
1243894eb8 Add DistributionFunctionFactory subclasses for discrete/continuous default 2023-10-18 20:44:17 +02:00
Dominik Jain
a8dc75fbab ExperimentBuilder: Allow experiment_config and sampling_config to be None 2023-10-18 20:44:17 +02:00
Dominik Jain
837ff13c04 Reorder ExperimentBuilder args (EnvFactory first) 2023-10-18 20:44:17 +02:00
Dominik Jain
d269063e6a Remove 'RL' prefix from class names 2023-10-18 20:44:17 +02:00
Dominik Jain
50ac385321 Add some basic tests for high-level experiment builder API 2023-10-18 20:44:16 +02:00
Dominik Jain
b54fcd12cb Change high-level DQN interface to expect an actor instead of a critic,
because that is what is functionally required
2023-10-18 20:44:16 +02:00
Dominik Jain
1cba589bd4 Add DQN support in high-level API
* Allow to specify trainer callbacks (train_fn, test_fn, stop_fn)
  in high-level API, adding the necessary abstractions and pass-on
  mechanisms
* Add example atari_dqn_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
358978c65d Add ToStringMixin to further high-level parameter classes 2023-10-18 20:44:16 +02:00
Dominik Jain
8f67c2e9d9 Disable numba DEBUG logs 2023-10-18 20:44:16 +02:00
Dominik Jain
9f0a410bb1 Log full experiment configuration, adding string representations to relevant classes 2023-10-18 20:44:16 +02:00
Dominik Jain
58bd20f882 Add logging module 2023-10-18 20:44:16 +02:00
Dominik Jain
ce26e25923 Handle ruff complaints in string module 2023-10-18 20:44:16 +02:00
Dominik Jain
de70147752 Add string module from sensAI 2023-10-18 20:44:16 +02:00
Dominik Jain
2671580c6c Add DDPG high-level API and MuJoCo example 2023-10-18 20:44:16 +02:00
Dominik Jain
6b6d9ea609 Add support for discrete PPO
* Refactored module `module` (split into submodules)
* Basic support for discrete environments
* Implement Atari env. factory
* Implement DQN-based actor factory
* Implement notion of reusing agent preprocessing network for critic
* Add example atari_ppo_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
e0e7349b0a Add base class BaseActor with method get_preprocess_net for high-level API 2023-10-18 20:44:16 +02:00
Dominik Jain
cd79cf8661 Add A2C high-level API
* Add common based class for A2C and PPO agent factories
* Add default for dist_fn parameter, adding corresponding factories
* Add example mujoco_a2c_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
acd89fa3b0 Remove parameter transformers from config object state,
composing the list dynamically instead
2023-10-18 20:44:16 +02:00
Dominik Jain
78b6dd1f49 Adapt class naming scheme
* Use prefix convention (subclasses have superclass names as prefix) to
  facilitate discoverability of relevant classes via IDE autocompletion
* Use dual naming, adding an alternative concise name that omits the
  precise OO semantics and retains only the essential part of the name
  (which can be more pleasing to users not accustomed to
  convoluted OO naming)
2023-10-18 20:44:16 +02:00
Michael Panchenko
5bcf514c55 Add alternative functional interface for environment creation
where a persistable configuration object is passed as an
argument, as this can help to ensure persistability (making the
requirement explicit)
2023-10-18 20:44:16 +02:00
Dominik Jain
d4e604b46e Move parameter transformation directly into parameter objects,
achieving greater separation of concerns and improved maintainability
2023-10-18 20:44:16 +02:00
Dominik Jain
38cf982034 Disable Ruff rule D205 (blank-line-after-summary)
because it disallows, in particular, class docstrings that consist
only of a summary line
2023-10-18 20:44:16 +02:00
Dominik Jain
e993425aa1 Add high-level API support for TD3
* Created mixins for agent factories to reduce code duplication
 * Further factorised params & mixins for experiment factories
 * Additional parameter abstractions
 * Implement high-level MuJoCo TD3 example
2023-10-18 20:44:16 +02:00
Dominik Jain
6a739384ee WandbLogger: Use less restrictive type annotation for config 2023-10-18 20:44:16 +02:00
Dominik Jain
367778d37f Improve high-level policy parametrisation
Policy objects are now parametrised by converting the parameter
dataclass instances to kwargs, using some injectable conversions
along the way
2023-10-18 20:44:16 +02:00
Dominik Jain
37dc07e487 Add high-level experiment builder interface 2023-10-18 20:44:05 +02:00
dependabot[bot]
4a51e69265
Bump urllib3 from 2.0.6 to 2.0.7 (#972)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.0.7</h2>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.
(GHSA-g4mx-q9vg-27p4)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.0.7 (2023-10-17)</h1>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="56f01e088d"><code>56f01e0</code></a>
Release 2.0.7</li>
<li><a
href="4e50fbc5db"><code>4e50fbc</code></a>
Merge pull request from GHSA-g4mx-q9vg-27p4</li>
<li><a
href="80808b04bf"><code>80808b0</code></a>
Fix docs build on Python 3.12 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3144">#3144</a>)</li>
<li><a
href="f28deff1cf"><code>f28deff</code></a>
Add 1.26.17 to the current changelog</li>
<li>See full diff in <a
href="https://github.com/urllib3/urllib3/compare/2.0.6...2.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.0.6&new-version=2.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 21:13:19 -04:00
Fahmid Morshed Fahid
bf7841078d
Fixed the mapolicy train issue (#968)
The trained MARL policies were not performing as expected because the
parent class (MultiAgentPolicyManager) needed a train function.

Fixes thu-ml/tianshou#967
2023-10-16 17:52:07 -07:00
Michael Panchenko
66b7fc542b
Minor dep update (#961)
Support gymnasium >=0.28, small extension of readme
2023-10-09 22:10:09 +02:00
Dominik Jain
4d53d345d6 Ignore Ruff rule RET505, because it sacrifices visual discernability
of control flow paths for brevity (regarding return statements)
2023-10-09 13:03:19 +02:00
Dominik Jain
3fd60f9e70 Unify PPO configuration objects, use experiment-specific configuration
in mujoco_ppo_hl
2023-10-09 13:02:29 +02:00
Dominik Jain
8ec42009cb Move RLSamplingConfig to separate module config, fixing cyclic import 2023-10-09 13:02:23 +02:00
Dominik Jain
d26b8cb40c Use experiment-specific config in mujoco_sac_hl, adding auto-alpha 2023-10-09 13:02:18 +02:00
Dominik Jain
adc324038a Remove LoggerConfig 2023-10-09 13:02:13 +02:00
Dominik Jain
997b520580 Refactoring, dropping package config 2023-10-09 13:02:07 +02:00
Dominik Jain
316eb3c579 Add SAC high-level interface 2023-10-09 13:02:01 +02:00
Dominik Jain
2a1cc6bb55 Enable ruff setting ignore-init-module-imports 2023-10-09 13:01:53 +02:00
Dominik Jain
25c6bbd38c Ignore D106: Missing docstring in public nested class 2023-10-09 13:01:44 +02:00
Dominik Jain
16ed5fd2a5 Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-10-09 13:01:35 +02:00
Michael Panchenko
a54aade730 Addition of dataclasses based config for scripts, major refactoring
So far only for one script (mujoco_ppo_cfg), extension will follow

Conflicts:
	examples/mujoco/mujoco_env.py
	examples/mujoco/mujoco_ppo.py
	setup.py
2023-10-09 13:01:27 +02:00
Dominik Jain
42fc181d74 Add dev dependencies jsonargparse and docstring_parser 2023-10-09 13:01:11 +02:00
Michael Panchenko
b900fdf6f2
Remove kwargs in policy init (#950)
Closes #947 

This removes all kwargs from all policy constructors. While doing that,
I also improved several names and added a whole lot of TODOs.

## Functional changes:

1. Added possibility to pass None as `critic2` and `critic2_optim`. In
fact, the default behavior then should cover the absolute majority of
cases
2. Added a function called `clone_optimizer` as a temporary measure to
support passing `critic2_optim=None`

## Breaking changes:

1. `action_space` is no longer optional. In fact, it already was
non-optional, as there was a ValueError in BasePolicy.init. So now
several examples were fixed to reflect that
2. `reward_normalization` removed from DDPG and children. It was never
allowed to pass it as `True` there, an error would have been raised in
`compute_n_step_reward`. Now I removed it from the interface
3. renamed `critic1` and similar to `critic`, in order to have uniform
interfaces. Note that the `critic` in DDPG was optional for the sole
reason that child classes used `critic1`. I removed this optionality
(DDPG can't do anything with `critic=None`)
4. Several renamings of fields (mostly private to public, so backwards
compatible)

## Additional changes: 
1. Removed type and default declaration from docstring. This kind of
duplication is really not necessary
2. Policy constructors are now only called using named arguments, not a
fragile mixture of positional and named as before
5. Minor beautifications in typing and code 
6. Generally shortened docstrings and made them uniform across all
policies (hopefully)

## Comment:

With these changes, several problems in tianshou's inheritance hierarchy
become more apparent. I tried highlighting them for future work.

---------

Co-authored-by: Dominik Jain <d.jain@appliedai.de>
2023-10-08 08:57:03 -07:00
dependabot[bot]
bc7ec9c149
Bump pillow from 10.0.0 to 10.0.1 (#958)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.0.0 to
10.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/releases">pillow's
releases</a>.</em></p>
<blockquote>
<h2>10.0.1</h2>
<p><a
href="https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html">https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html</a></p>
<h2>Changes</h2>
<ul>
<li>Updated libwebp to 1.3.2 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated zlib to 1.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst">pillow's
changelog</a>.</em></p>
<blockquote>
<h2>10.0.1 (2023-09-15)</h2>
<ul>
<li>
<p>Updated libwebp to 1.3.2 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a>
[radarhere]</p>
</li>
<li>
<p>Updated zlib to 1.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a>
[radarhere]</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e34d346f10"><code>e34d346</code></a>
Updated order</li>
<li><a
href="a62f2402a6"><code>a62f240</code></a>
10.0.1 version bump</li>
<li><a
href="d50250d9ea"><code>d50250d</code></a>
Added release notes for 10.0.1</li>
<li><a
href="b4c7d4b8b2"><code>b4c7d4b</code></a>
Update CHANGES.rst [ci skip]</li>
<li><a
href="730f74600e"><code>730f746</code></a>
Updated libwebp to 1.3.2</li>
<li><a
href="b0e28048d6"><code>b0e2804</code></a>
Updated zlib to 1.3</li>
<li>See full diff in <a
href="https://github.com/python-pillow/Pillow/compare/10.0.0...10.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=10.0.0&new-version=10.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 20:31:57 -07:00
dependabot[bot]
b24f270a74
Bump urllib3 from 1.26.16 to 1.26.17 (#957)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.16 to
1.26.17.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>1.26.17</h2>
<ul>
<li>Added the <code>Cookie</code> header to the list of headers to strip
from requests when redirecting to a different host. As before, different
headers can be set via <code>Retry.remove_headers_on_redirect</code>.
(GHSA-v845-jxx5-vc9f)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>1.26.17 (2023-10-02)</h1>
<ul>
<li>Added the <code>Cookie</code> header to the list of headers to strip
from requests when redirecting to a different host. As before, different
headers can be set via <code>Retry.remove_headers_on_redirect</code>.
(<code>[#3139](https://github.com/urllib3/urllib3/issues/3139)
&lt;https://github.com/urllib3/urllib3/pull/3139&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c9016bf464"><code>c9016bf</code></a>
Release 1.26.17</li>
<li><a
href="01220354d3"><code>0122035</code></a>
Backport GHSA-v845-jxx5-vc9f (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3139">#3139</a>)</li>
<li><a
href="e63989f97d"><code>e63989f</code></a>
Fix installing <code>brotli</code> extra on Python 2.7</li>
<li><a
href="2e7a24d087"><code>2e7a24d</code></a>
[1.26] Configure OS for RTD to fix building docs</li>
<li><a
href="57181d6ea9"><code>57181d6</code></a>
[1.26] Improve error message when calling urllib3.request() (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3058">#3058</a>)</li>
<li><a
href="3c0148048a"><code>3c01480</code></a>
[1.26] Run coverage even with failed jobs</li>
<li>See full diff in <a
href="https://github.com/urllib3/urllib3/compare/1.26.16...1.26.17">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.16&new-version=1.26.17)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 23:58:26 +00:00