476 Commits

Author SHA1 Message Date
Dominik Jain
58bd20f882 Add logging module 2023-10-18 20:44:16 +02:00
Dominik Jain
ce26e25923 Handle ruff complaints in string module 2023-10-18 20:44:16 +02:00
Dominik Jain
de70147752 Add string module from sensAI 2023-10-18 20:44:16 +02:00
Dominik Jain
2671580c6c Add DDPG high-level API and MuJoCo example 2023-10-18 20:44:16 +02:00
Dominik Jain
6b6d9ea609 Add support for discrete PPO
* Refactored module `module` (split into submodules)
* Basic support for discrete environments
* Implement Atari env. factory
* Implement DQN-based actor factory
* Implement notion of reusing agent preprocessing network for critic
* Add example atari_ppo_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
e0e7349b0a Add base class BaseActor with method get_preprocess_net for high-level API 2023-10-18 20:44:16 +02:00
Dominik Jain
cd79cf8661 Add A2C high-level API
* Add common based class for A2C and PPO agent factories
* Add default for dist_fn parameter, adding corresponding factories
* Add example mujoco_a2c_hl
2023-10-18 20:44:16 +02:00
Dominik Jain
acd89fa3b0 Remove parameter transformers from config object state,
composing the list dynamically instead
2023-10-18 20:44:16 +02:00
Dominik Jain
78b6dd1f49 Adapt class naming scheme
* Use prefix convention (subclasses have superclass names as prefix) to
  facilitate discoverability of relevant classes via IDE autocompletion
* Use dual naming, adding an alternative concise name that omits the
  precise OO semantics and retains only the essential part of the name
  (which can be more pleasing to users not accustomed to
  convoluted OO naming)
2023-10-18 20:44:16 +02:00
Michael Panchenko
5bcf514c55 Add alternative functional interface for environment creation
where a persistable configuration object is passed as an
argument, as this can help to ensure persistability (making the
requirement explicit)
2023-10-18 20:44:16 +02:00
Dominik Jain
d4e604b46e Move parameter transformation directly into parameter objects,
achieving greater separation of concerns and improved maintainability
2023-10-18 20:44:16 +02:00
Dominik Jain
38cf982034 Disable Ruff rule D205 (blank-line-after-summary)
because it disallows, in particular, class docstrings that consist
only of a summary line
2023-10-18 20:44:16 +02:00
Dominik Jain
e993425aa1 Add high-level API support for TD3
* Created mixins for agent factories to reduce code duplication
 * Further factorised params & mixins for experiment factories
 * Additional parameter abstractions
 * Implement high-level MuJoCo TD3 example
2023-10-18 20:44:16 +02:00
Dominik Jain
6a739384ee WandbLogger: Use less restrictive type annotation for config 2023-10-18 20:44:16 +02:00
Dominik Jain
367778d37f Improve high-level policy parametrisation
Policy objects are now parametrised by converting the parameter
dataclass instances to kwargs, using some injectable conversions
along the way
2023-10-18 20:44:16 +02:00
Dominik Jain
37dc07e487 Add high-level experiment builder interface 2023-10-18 20:44:05 +02:00
dependabot[bot]
4a51e69265
Bump urllib3 from 2.0.6 to 2.0.7 (#972)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.0.7</h2>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.
(GHSA-g4mx-q9vg-27p4)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.0.7 (2023-10-17)</h1>
<ul>
<li>Made body stripped from HTTP requests changing the request method to
GET after HTTP 303 &quot;See Other&quot; redirect responses.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="56f01e088d"><code>56f01e0</code></a>
Release 2.0.7</li>
<li><a
href="4e50fbc5db"><code>4e50fbc</code></a>
Merge pull request from GHSA-g4mx-q9vg-27p4</li>
<li><a
href="80808b04bf"><code>80808b0</code></a>
Fix docs build on Python 3.12 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3144">#3144</a>)</li>
<li><a
href="f28deff1cf"><code>f28deff</code></a>
Add 1.26.17 to the current changelog</li>
<li>See full diff in <a
href="https://github.com/urllib3/urllib3/compare/2.0.6...2.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.0.6&new-version=2.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 21:13:19 -04:00
Fahmid Morshed Fahid
bf7841078d
Fixed the mapolicy train issue (#968)
The trained MARL policies were not performing as expected because the
parent class (MultiAgentPolicyManager) needed a train function.

Fixes thu-ml/tianshou#967
2023-10-16 17:52:07 -07:00
Michael Panchenko
66b7fc542b
Minor dep update (#961)
Support gymnasium >=0.28, small extension of readme
2023-10-09 22:10:09 +02:00
Dominik Jain
4d53d345d6 Ignore Ruff rule RET505, because it sacrifices visual discernability
of control flow paths for brevity (regarding return statements)
2023-10-09 13:03:19 +02:00
Dominik Jain
3fd60f9e70 Unify PPO configuration objects, use experiment-specific configuration
in mujoco_ppo_hl
2023-10-09 13:02:29 +02:00
Dominik Jain
8ec42009cb Move RLSamplingConfig to separate module config, fixing cyclic import 2023-10-09 13:02:23 +02:00
Dominik Jain
d26b8cb40c Use experiment-specific config in mujoco_sac_hl, adding auto-alpha 2023-10-09 13:02:18 +02:00
Dominik Jain
adc324038a Remove LoggerConfig 2023-10-09 13:02:13 +02:00
Dominik Jain
997b520580 Refactoring, dropping package config 2023-10-09 13:02:07 +02:00
Dominik Jain
316eb3c579 Add SAC high-level interface 2023-10-09 13:02:01 +02:00
Dominik Jain
2a1cc6bb55 Enable ruff setting ignore-init-module-imports 2023-10-09 13:01:53 +02:00
Dominik Jain
25c6bbd38c Ignore D106: Missing docstring in public nested class 2023-10-09 13:01:44 +02:00
Dominik Jain
16ed5fd2a5 Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-10-09 13:01:35 +02:00
Michael Panchenko
a54aade730 Addition of dataclasses based config for scripts, major refactoring
So far only for one script (mujoco_ppo_cfg), extension will follow

Conflicts:
	examples/mujoco/mujoco_env.py
	examples/mujoco/mujoco_ppo.py
	setup.py
2023-10-09 13:01:27 +02:00
Dominik Jain
42fc181d74 Add dev dependencies jsonargparse and docstring_parser 2023-10-09 13:01:11 +02:00
Michael Panchenko
b900fdf6f2
Remove kwargs in policy init (#950)
Closes #947 

This removes all kwargs from all policy constructors. While doing that,
I also improved several names and added a whole lot of TODOs.

## Functional changes:

1. Added possibility to pass None as `critic2` and `critic2_optim`. In
fact, the default behavior then should cover the absolute majority of
cases
2. Added a function called `clone_optimizer` as a temporary measure to
support passing `critic2_optim=None`

## Breaking changes:

1. `action_space` is no longer optional. In fact, it already was
non-optional, as there was a ValueError in BasePolicy.init. So now
several examples were fixed to reflect that
2. `reward_normalization` removed from DDPG and children. It was never
allowed to pass it as `True` there, an error would have been raised in
`compute_n_step_reward`. Now I removed it from the interface
3. renamed `critic1` and similar to `critic`, in order to have uniform
interfaces. Note that the `critic` in DDPG was optional for the sole
reason that child classes used `critic1`. I removed this optionality
(DDPG can't do anything with `critic=None`)
4. Several renamings of fields (mostly private to public, so backwards
compatible)

## Additional changes: 
1. Removed type and default declaration from docstring. This kind of
duplication is really not necessary
2. Policy constructors are now only called using named arguments, not a
fragile mixture of positional and named as before
5. Minor beautifications in typing and code 
6. Generally shortened docstrings and made them uniform across all
policies (hopefully)

## Comment:

With these changes, several problems in tianshou's inheritance hierarchy
become more apparent. I tried highlighting them for future work.

---------

Co-authored-by: Dominik Jain <d.jain@appliedai.de>
2023-10-08 08:57:03 -07:00
dependabot[bot]
bc7ec9c149
Bump pillow from 10.0.0 to 10.0.1 (#958)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.0.0 to
10.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/releases">pillow's
releases</a>.</em></p>
<blockquote>
<h2>10.0.1</h2>
<p><a
href="https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html">https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html</a></p>
<h2>Changes</h2>
<ul>
<li>Updated libwebp to 1.3.2 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated zlib to 1.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst">pillow's
changelog</a>.</em></p>
<blockquote>
<h2>10.0.1 (2023-09-15)</h2>
<ul>
<li>
<p>Updated libwebp to 1.3.2 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a>
[radarhere]</p>
</li>
<li>
<p>Updated zlib to 1.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a>
[radarhere]</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e34d346f10"><code>e34d346</code></a>
Updated order</li>
<li><a
href="a62f2402a6"><code>a62f240</code></a>
10.0.1 version bump</li>
<li><a
href="d50250d9ea"><code>d50250d</code></a>
Added release notes for 10.0.1</li>
<li><a
href="b4c7d4b8b2"><code>b4c7d4b</code></a>
Update CHANGES.rst [ci skip]</li>
<li><a
href="730f74600e"><code>730f746</code></a>
Updated libwebp to 1.3.2</li>
<li><a
href="b0e28048d6"><code>b0e2804</code></a>
Updated zlib to 1.3</li>
<li>See full diff in <a
href="https://github.com/python-pillow/Pillow/compare/10.0.0...10.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=10.0.0&new-version=10.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 20:31:57 -07:00
dependabot[bot]
b24f270a74
Bump urllib3 from 1.26.16 to 1.26.17 (#957)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.16 to
1.26.17.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>1.26.17</h2>
<ul>
<li>Added the <code>Cookie</code> header to the list of headers to strip
from requests when redirecting to a different host. As before, different
headers can be set via <code>Retry.remove_headers_on_redirect</code>.
(GHSA-v845-jxx5-vc9f)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>1.26.17 (2023-10-02)</h1>
<ul>
<li>Added the <code>Cookie</code> header to the list of headers to strip
from requests when redirecting to a different host. As before, different
headers can be set via <code>Retry.remove_headers_on_redirect</code>.
(<code>[#3139](https://github.com/urllib3/urllib3/issues/3139)
&lt;https://github.com/urllib3/urllib3/pull/3139&gt;</code>_)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c9016bf464"><code>c9016bf</code></a>
Release 1.26.17</li>
<li><a
href="01220354d3"><code>0122035</code></a>
Backport GHSA-v845-jxx5-vc9f (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3139">#3139</a>)</li>
<li><a
href="e63989f97d"><code>e63989f</code></a>
Fix installing <code>brotli</code> extra on Python 2.7</li>
<li><a
href="2e7a24d087"><code>2e7a24d</code></a>
[1.26] Configure OS for RTD to fix building docs</li>
<li><a
href="57181d6ea9"><code>57181d6</code></a>
[1.26] Improve error message when calling urllib3.request() (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3058">#3058</a>)</li>
<li><a
href="3c0148048a"><code>3c01480</code></a>
[1.26] Run coverage even with failed jobs</li>
<li>See full diff in <a
href="https://github.com/urllib3/urllib3/compare/1.26.16...1.26.17">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.16&new-version=1.26.17)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 23:58:26 +00:00
dependabot[bot]
d11a5a3d99
Bump gitpython from 3.1.33 to 3.1.35 (#953)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython)
from 3.1.33 to 3.1.35.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/gitpython-developers/GitPython/releases">gitpython's
releases</a>.</em></p>
<blockquote>
<h2>3.1.35 - a fix for CVE-2023-41040</h2>
<h2>What's Changed</h2>
<ul>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1643">gitpython-developers/GitPython#1643</a></li>
<li>Fix 'Tree' object has no attribute '_name' when submodule path is
normal path by <a
href="https://github.com/CosmosAtlas"><code>@​CosmosAtlas</code></a> in
<a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1645">gitpython-developers/GitPython#1645</a></li>
<li>Fix CVE-2023-41040 by <a
href="https://github.com/facutuesca"><code>@​facutuesca</code></a> in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1644">gitpython-developers/GitPython#1644</a></li>
<li>Only make config more permissive in tests that need it by <a
href="https://github.com/EliahKagan"><code>@​EliahKagan</code></a> in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1648">gitpython-developers/GitPython#1648</a></li>
<li>Added test for PR <a
href="https://redirect.github.com/gitpython-developers/GitPython/issues/1645">#1645</a>
submodule path by <a
href="https://github.com/CosmosAtlas"><code>@​CosmosAtlas</code></a> in
<a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1647">gitpython-developers/GitPython#1647</a></li>
<li>Fix Windows environment variable upcasing bug by <a
href="https://github.com/EliahKagan"><code>@​EliahKagan</code></a> in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1650">gitpython-developers/GitPython#1650</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/CosmosAtlas"><code>@​CosmosAtlas</code></a>
made their first contribution in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1645">gitpython-developers/GitPython#1645</a></li>
<li><a
href="https://github.com/facutuesca"><code>@​facutuesca</code></a> made
their first contribution in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1644">gitpython-developers/GitPython#1644</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/gitpython-developers/GitPython/compare/3.1.34...3.1.35">https://github.com/gitpython-developers/GitPython/compare/3.1.34...3.1.35</a></p>
<h2>3.1.34 - fix resource leaking</h2>
<h2>What's Changed</h2>
<ul>
<li>util: close lockfile after opening successfully by <a
href="https://github.com/skshetry"><code>@​skshetry</code></a> in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1639">gitpython-developers/GitPython#1639</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/skshetry"><code>@​skshetry</code></a>
made their first contribution in <a
href="https://redirect.github.com/gitpython-developers/GitPython/pull/1639">gitpython-developers/GitPython#1639</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.34">https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.34</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c8e303ffd3"><code>c8e303f</code></a>
prepare next release</li>
<li><a
href="09e1b3dbae"><code>09e1b3d</code></a>
Merge pull request <a
href="https://redirect.github.com/gitpython-developers/GitPython/issues/1650">#1650</a>
from EliahKagan/envcase</li>
<li><a
href="8017421ade"><code>8017421</code></a>
Merge pull request <a
href="https://redirect.github.com/gitpython-developers/GitPython/issues/1647">#1647</a>
from CosmosAtlas/master</li>
<li><a
href="fafb4f6651"><code>fafb4f6</code></a>
updated docs to better describe testing procedure with new repo</li>
<li><a
href="9da24d46c6"><code>9da24d4</code></a>
add test for submodule path not owned by submodule case</li>
<li><a
href="eebdb25ee6"><code>eebdb25</code></a>
Eliminate duplication of git.util.cwd logic</li>
<li><a
href="c7fad20be5"><code>c7fad20</code></a>
Fix Windows env var upcasing regression</li>
<li><a
href="7296e5c021"><code>7296e5c</code></a>
Make test helper script a file, for readability</li>
<li><a
href="d88372a11a"><code>d88372a</code></a>
Add test for Windows env var upcasing regression</li>
<li><a
href="11839ab5ce"><code>11839ab</code></a>
Merge pull request <a
href="https://redirect.github.com/gitpython-developers/GitPython/issues/1648">#1648</a>
from EliahKagan/file-protocol</li>
<li>Additional commits viewable in <a
href="https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.35">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gitpython&package-manager=pip&previous-version=3.1.33&new-version=3.1.35)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 07:52:57 +00:00
Anas BELFADIL
c30b4abb8f
Add calibration to CQL as in CalQL paper arXiv:2303.05479 (#915)
- [X] I have marked all applicable categories:
    + [ ] exception-raising fix
    + [ ] algorithm implementation fix
    + [ ] documentation modification
    + [X] new feature
- [X] I have reformatted the code using `make format` (**required**)
- [X] I have checked the code using `make commit-checks` (**required**)
- [X] If applicable, I have mentioned the relevant/related issue(s)
- [X] If applicable, I have listed every items in this Pull Request
below
2023-10-02 22:54:34 -07:00
Jiayi Weng
6449a43261
Fix documentation build (#951)
Close #941 
rtfd build link:
https://readthedocs.org/projects/tianshou/builds/22019877/

Also -- fix two small issues reported by users, see #928 and #930

Note: I created the branch in thu-ml:tianshou instead of
Trinkle23897:tianshou to quickly check the rtfd build. It's not a good
process since every commit would trigger twice CI pipelines :(
2023-09-26 08:24:08 -07:00
Michael Panchenko
c8e7d02cba
Minor: use Self type where appropriate (#942)
Small typing improvement, related to
https://github.com/thu-ml/tianshou/pull/915#discussion_r1329734222
2023-09-19 15:40:32 -07:00
Michael Panchenko
2cc34fb72b
Poetry install, remove gym, bump python (#925)
Closes #914 

Additional changes:

- Deprecate python below 11
- Remove 3rd party and throughput tests. This simplifies install and
test pipeline
- Remove gym compatibility and shimmy
- Format with 3.11 conventions. In particular, add `zip(...,
strict=True/False)` where possible

Since the additional tests and gym were complicating the CI pipeline
(flaky and dist-dependent), it didn't make sense to work on fixing the
current tests in this PR to then just delete them in the next one. So
this PR changes the build and removes these tests at the same time.
2023-09-05 14:34:23 -07:00
Michael Panchenko
600f4bbd55
Python 3.9, black + ruff formatting (#921)
Preparation for #914 and #920

Changes formatting to ruff and black. Remove python 3.8

## Additional Changes

- Removed flake8 dependencies
- Adjusted pre-commit. Now CI and Make use pre-commit, reducing the
duplication of linting calls
- Removed check-docstyle option (ruff is doing that)
- Merged format and lint. In CI the format-lint step fails if any
changes are done, so it fulfills the lint functionality.

---------

Co-authored-by: Jiayi Weng <jiayi@openai.com>
2023-08-25 14:40:56 -07:00
Michael Panchenko
07702fc007
Improved typing and reduced duplication (#912)
# Goals of the PR

The PR introduces **no changes to functionality**, apart from improved
input validation here and there. The main goals are to reduce some
complexity of the code, to improve types and IDE completions, and to
extend documentation and block comments where appropriate. Because of
the change to the trainer interfaces, many files are affected (more
details below), but still the overall changes are "small" in a certain
sense.

## Major Change 1 - BatchProtocol

**TL;DR:** One can now annotate which fields the batch is expected to
have on input params and which fields a returned batch has. Should be
useful for reading the code. getting meaningful IDE support, and
catching bugs with mypy. This annotation strategy will continue to work
if Batch is replaced by TensorDict or by something else.

**In more detail:** Batch itself has no fields and using it for
annotations is of limited informational power. Batches with fields are
not separate classes but instead instances of Batch directly, so there
is no type that could be used for annotation. Fortunately, python
`Protocol` is here for the rescue. With these changes we can now do
things like

```python
class ActionBatchProtocol(BatchProtocol):
    logits: Sequence[Union[tuple, torch.Tensor]]
    dist: torch.distributions.Distribution
    act: torch.Tensor
    state: Optional[torch.Tensor]


class RolloutBatchProtocol(BatchProtocol):
    obs: torch.Tensor
    obs_next: torch.Tensor
    info: Dict[str, Any]
    rew: torch.Tensor
    terminated: torch.Tensor
    truncated: torch.Tensor

class PGPolicy(BasePolicy):
    ...

    def forward(
        self,
        batch: RolloutBatchProtocol,
        state: Optional[Union[dict, Batch, np.ndarray]] = None,
        **kwargs: Any,
    ) -> ActionBatchProtocol:

```

The IDE and mypy are now very helpful in finding errors and in
auto-completion, whereas before the tools couldn't assist in that at
all.

## Major Change 2 - remove duplication in trainer package

**TL;DR:** There was a lot of duplication between `BaseTrainer` and its
subclasses. Even worse, it was almost-duplication. There was also
interface fragmentation through things like `onpolicy_trainer`. Now this
duplication is gone and all downstream code was adjusted.

**In more detail:** Since this change affects a lot of code, I would
like to explain why I thought it to be necessary.

1. The subclasses of `BaseTrainer` just duplicated docstrings and
constructors. What's worse, they changed the order of args there, even
turning some kwargs of BaseTrainer into args. They also had the arg
`learning_type` which was passed as kwarg to the base class and was
unused there. This made things difficult to maintain, and in fact some
errors were already present in the duplicated docstrings.
2. The "functions" a la `onpolicy_trainer`, which just called the
`OnpolicyTrainer.run`, not only introduced interface fragmentation but
also completely obfuscated the docstring and interfaces. They themselves
had no dosctring and the interface was just `*args, **kwargs`, which
makes it impossible to understand what they do and which things can be
passed without reading their implementation, then reading the docstring
of the associated class, etc. Needless to say, mypy and IDEs provide no
support with such functions. Nevertheless, they were used everywhere in
the code-base. I didn't find the sacrifices in clarity and complexity
justified just for the sake of not having to write `.run()` after
instantiating a trainer.
3. The trainers are all very similar to each other. As for my
application I needed a new trainer, I wanted to understand their
structure. The similarity, however, was hard to discover since they were
all in separate modules and there was so much duplication. I kept
staring at the constructors for a while until I figured out that
essentially no changes to the superclass were introduced. Now they are
all in the same module and the similarities/differences between them are
much easier to grasp (in my opinion)
4. Because of (1), I had to manually change and check a lot of code,
which was very tedious and boring. This kind of work won't be necessary
in the future, since now IDEs can be used for changing signatures,
renaming args and kwargs, changing class names and so on.

I have some more reasons, but maybe the above ones are convincing
enough.

## Minor changes: improved input validation and types

I added input validation for things like `state` and `action_scaling`
(which only makes sense for continuous envs). After adding this, some
tests failed to pass this validation. There I added
`action_scaling=isinstance(env.action_space, Box)`, after which tests
were green. I don't know why the tests were green before, since action
scaling doesn't make sense for discrete actions. I guess some aspect was
not tested and didn't crash.

I also added Literal in some places, in particular for
`action_bound_method`. Now it is no longer allowed to pass an empty
string, instead one should pass `None`. Also here there is input
validation with clear error messages.

@Trinkle23897 The functional tests are green. I didn't want to fix the
formatting, since it will change in the next PR that will solve #914
anyway. I also found a whole bunch of code in `docs/_static`, which I
just deleted (shouldn't it be copied from the sources during docs build
instead of committed?). I also haven't adjusted the documentation yet,
which atm still mentions the trainers of the type
`onpolicy_trainer(...)` instead of `OnpolicyTrainer(...).run()`

## Breaking Changes

The adjustments to the trainer package introduce breaking changes as
duplicated interfaces are deleted. However, it should be very easy for
users to adjust to them

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
2023-08-22 09:54:46 -07:00
Anas BELFADIL
80a698be52
Custom keys support in ReplayBuffer (#903)
Issue: Custom keys support in ReplayBuffer #902
Modified `ReplayBuffer` `add` and `__getitem__` methods.
Added `test_custom_key()` to test_buffer.py
2023-08-10 16:06:10 -07:00
Jiayi Weng
61182450b6
add py.typed, drop 3.6/3.7, support 3.11 (#910)
closing #892 #901
2023-08-10 14:13:46 -07:00
Błażej Osiński
864ee3df2f
Make monitor_gym configurable in WandbLogger. (#896)
At the moment, WandbLogger is always using wandb.init with monitor_gym =
True.
This fails when OpenAI's gym is not installed, which doesn't make sense
after the transition to Gymnasium.

I am using Tianshou with non-standard RL environment, which adhere to
Gymnasium API, and the current code is throwing exceptions.

I suggest to make it a controllable parameter. I left the default value
to True (to make it functionally the same for people using gym). It may
also make sense to change the default to False.
2023-08-09 15:13:25 -07:00
Błażej Osiński
cd218dc12d
Add assert description. (#894)
**The assert was missing a description, I fixed it.**

Please note: there is an error in the documentations, but it does not
seem to be related to my changes.
2023-08-09 15:12:42 -07:00
Anas BELFADIL
cb8551f315
Fix master branch test issues (#908) 2023-08-09 10:27:18 -07:00
Zhenjie Zhao
f8808d236f
fix a problem of the atari dqn example (#861) 2023-04-30 08:44:27 -07:00
Gen
7ce62a6ad4
actor critic share head bug for example code without sharing head - unify code style (#860) 2023-04-28 21:43:22 -07:00
ChenDRAG
1423eeb3b2
Add warnings for duplicate usage of action-bounded actor and action scaling method (#850)
- Fix the current bug discussed in #844 in `test_ppo.py`.
- Add warning for `ActorProb ` if both `max_action ` and
`unbounded=True` are used for model initializations.
- Add warning for PGpolicy and DDPGpolicy if they find duplicate usage
of action-bounded actor and action scaling method.
2023-04-23 16:03:31 -07:00
wckwan
e7c2c3711e
Update gail.py (#849)
Remove repeated description of lr_scheduler in the doc string.
2023-04-13 07:25:57 -07:00