724 Commits

Author SHA1 Message Date
Michael Panchenko
1cd22f1d32 Added and used new VenvType: SUBPROC_SHARED_MEM_AUTO 2024-05-07 14:13:20 +02:00
dependabot[bot]
d58ae163f2
Bump jinja2 from 3.1.3 to 3.1.4 (#1139)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/jinja/releases">jinja2's
releases</a>.</em></p>
<blockquote>
<h2>3.1.4</h2>
<p>This is the Jinja 3.1.4 security release, which fixes security issues
and bugs but does not otherwise change behavior and should not result in
breaking changes.</p>
<p>PyPI: <a
href="https://pypi.org/project/Jinja2/3.1.4/">https://pypi.org/project/Jinja2/3.1.4/</a>
Changes: <a
href="https://jinja.palletsprojects.com/en/3.1.x/changes/#version-3-1-4">https://jinja.palletsprojects.com/en/3.1.x/changes/#version-3-1-4</a></p>
<ul>
<li>The <code>xmlattr</code> filter does not allow keys with
<code>/</code> solidus, <code>&gt;</code> greater-than sign, or
<code>=</code> equals sign, in addition to disallowing spaces.
Regardless of any validation done by Jinja, user input should never be
used as keys to this filter, or must be separately validated first.
GHSA-h75v-3vvj-5mfj</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/jinja/blob/main/CHANGES.rst">jinja2's
changelog</a>.</em></p>
<blockquote>
<h2>Version 3.1.4</h2>
<p>Released 2024-05-05</p>
<ul>
<li>The <code>xmlattr</code> filter does not allow keys with
<code>/</code> solidus, <code>&gt;</code>
greater-than sign, or <code>=</code> equals sign, in addition to
disallowing spaces.
Regardless of any validation done by Jinja, user input should never be
used
as keys to this filter, or must be separately validated first.
:ghsa:<code>h75v-3vvj-5mfj</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dd4a8b5466"><code>dd4a8b5</code></a>
release version 3.1.4</li>
<li><a
href="0668239dc6"><code>0668239</code></a>
Merge pull request from GHSA-h75v-3vvj-5mfj</li>
<li><a
href="d655030770"><code>d655030</code></a>
disallow invalid characters in keys to xmlattr filter</li>
<li><a
href="a7863ba9d3"><code>a7863ba</code></a>
add ghsa links</li>
<li><a
href="b5c98e78c2"><code>b5c98e7</code></a>
start version 3.1.4</li>
<li><a
href="da3a9f0b80"><code>da3a9f0</code></a>
update project files (<a
href="https://redirect.github.com/pallets/jinja/issues/1968">#1968</a>)</li>
<li><a
href="0ee5eb41d1"><code>0ee5eb4</code></a>
satisfy formatter, linter, and strict mypy</li>
<li><a
href="20477c6357"><code>20477c6</code></a>
update project files (<a
href="https://redirect.github.com/pallets/jinja/issues/5457">#5457</a>)</li>
<li><a
href="e491223739"><code>e491223</code></a>
update pyyaml dev dependency</li>
<li><a
href="36f98854c7"><code>36f9885</code></a>
fix pr link</li>
<li>Additional commits viewable in <a
href="https://github.com/pallets/jinja/compare/3.1.3...3.1.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jinja2&package-manager=pip&previous-version=3.1.3&new-version=3.1.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-07 00:16:21 +02:00
dependabot[bot]
aa77f5549a
Bump werkzeug from 3.0.1 to 3.0.3 (#1138)
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to
3.0.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/werkzeug/releases">werkzeug's
releases</a>.</em></p>
<blockquote>
<h2>3.0.3</h2>
<p>This is the Werkzeug 3.0.3 security release, which fixes security
issues and bugs but does not otherwise change behavior and should not
result in breaking changes.</p>
<p>PyPI: <a
href="https://pypi.org/project/Werkzeug/3.0.3/">https://pypi.org/project/Werkzeug/3.0.3/</a>
Changes: <a
href="https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-3">https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-3</a>
Milestone: <a
href="https://github.com/pallets/werkzeug/milestone/35?closed=1">https://github.com/pallets/werkzeug/milestone/35?closed=1</a></p>
<ul>
<li>Only allow <code>localhost</code>, <code>.localhost</code>,
<code>127.0.0.1</code>, or the specified hostname when running the dev
server, to make debugger requests. Additional hosts can be added by
using the debugger middleware directly. The debugger UI makes requests
using the full URL rather than only the path. GHSA-2g68-c3qc-8985</li>
<li>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2823">#2823</a></li>
<li>Better TLS cert format with <code>adhoc</code> dev certs. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2891">#2891</a></li>
<li>Inform Python &lt; 3.12 how to handle <code>itms-services</code>
URIs correctly, rather than using an overly-broad workaround in Werkzeug
that caused some redirect URIs to be passed on without encoding. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2828">#2828</a></li>
<li>Type annotation for <code>Rule.endpoint</code> and other uses of
<code>endpoint</code> is <code>Any</code>. <a
href="https://redirect.github.com/pallets/werkzeug/issues/2836">#2836</a></li>
</ul>
<h2>3.0.2</h2>
<p>This is a fix release for the 3.0.x feature branch.</p>
<ul>
<li>Changes: <a
href="https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-2">https://werkzeug.palletsprojects.com/en/3.0.x/changes/#version-3-0-2</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pallets/werkzeug/blob/main/CHANGES.rst">werkzeug's
changelog</a>.</em></p>
<blockquote>
<h2>Version 3.0.3</h2>
<p>Released 2024-05-05</p>
<ul>
<li>
<p>Only allow <code>localhost</code>, <code>.localhost</code>,
<code>127.0.0.1</code>, or the specified
hostname when running the dev server, to make debugger requests.
Additional
hosts can be added by using the debugger middleware directly. The
debugger
UI makes requests using the full URL rather than only the path.
:ghsa:<code>2g68-c3qc-8985</code></p>
</li>
<li>
<p>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. :pr:<code>2823</code></p>
</li>
<li>
<p>Better TLS cert format with <code>adhoc</code> dev certs.
:pr:<code>2891</code></p>
</li>
<li>
<p>Inform Python &lt; 3.12 how to handle <code>itms-services</code> URIs
correctly, rather
than using an overly-broad workaround in Werkzeug that caused some
redirect
URIs to be passed on without encoding. :issue:<code>2828</code></p>
</li>
<li>
<p>Type annotation for <code>Rule.endpoint</code> and other uses of
<code>endpoint</code> is
<code>Any</code>. :issue:<code>2836</code></p>
</li>
<li>
<p>Make reloader more robust when <code>&quot;&quot;</code> is in
<code>sys.path</code>. :pr:<code>2823</code></p>
</li>
</ul>
<h2>Version 3.0.2</h2>
<p>Released 2024-04-01</p>
<ul>
<li>Ensure setting <code>merge_slashes</code> to <code>False</code>
results in <code>NotFound</code> for
repeated-slash requests against single slash routes.
:issue:<code>2834</code></li>
<li>Fix handling of <code>TypeError</code> in
<code>TypeConversionDict.get()</code> to match
<code>ValueError</code>. :issue:<code>2843</code></li>
<li>Fix <code>response_wrapper</code> type check in test client.
:issue:<code>2831</code></li>
<li>Make the return type of <code>MultiPartParser.parse</code> more
precise.
:issue:<code>2840</code></li>
<li>Raise an error if converter arguments cannot be parsed.
:issue:<code>2822</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f9995e9679"><code>f9995e9</code></a>
release version 3.0.3</li>
<li><a
href="3386395b24"><code>3386395</code></a>
Merge pull request from GHSA-2g68-c3qc-8985</li>
<li><a
href="890b6b6263"><code>890b6b6</code></a>
only require trusted host for evalex</li>
<li><a
href="71b69dfb7d"><code>71b69df</code></a>
restrict debugger trusted hosts</li>
<li><a
href="d2d3869525"><code>d2d3869</code></a>
endpoint type is Any (<a
href="https://redirect.github.com/pallets/werkzeug/issues/2895">#2895</a>)</li>
<li><a
href="7080b55acd"><code>7080b55</code></a>
endpoint type is Any</li>
<li><a
href="7555eff296"><code>7555eff</code></a>
remove iri_to_uri redirect workaround (<a
href="https://redirect.github.com/pallets/werkzeug/issues/2894">#2894</a>)</li>
<li><a
href="97fb2f7222"><code>97fb2f7</code></a>
remove _invalid_iri_to_uri workaround</li>
<li><a
href="249527ff98"><code>249527f</code></a>
make cn field a valid single hostname, and use wildcard in SANs field.
(<a
href="https://redirect.github.com/pallets/werkzeug/issues/2892">#2892</a>)</li>
<li><a
href="793be472c9"><code>793be47</code></a>
update adhoc tls dev cert format</li>
<li>Additional commits viewable in <a
href="https://github.com/pallets/werkzeug/compare/3.0.1...3.0.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=werkzeug&package-manager=pip&previous-version=3.0.1&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/thu-ml/tianshou/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-07 00:16:02 +02:00
Michael Panchenko
26b867e442
Adjust locations of setting the policy in train/eval mode (#1123)
Addresses #1122:
* We Introduced a new flag `is_within_training_step` which is enabled by
the training algorithm when within a training step, where a training
step encompasses training data collection and policy updates. This flag
is now used by algorithms to decide whether their `deterministic_eval`
setting should indeed apply instead of the torch training flag (which
was abused!).
* The policy's training/eval mode (which should control torch-level
learning only) no longer needs to be set in user code in order to
control collector behaviour (this didn't make sense!). The respective
calls have been removed.
* The policy should, in fact, always be in evaluation mode when applying
data collection, as there is no reason to ever have gradient
accumulation enabled for any type of rollout. We thus specifically set
the policy to evaluation mode in Collector.collect. Further, it never
makes sense to compute gradients during collection, so the possibility
to pass `no_grad=False` was removed.

Further changes:
- Base class for collectors: `BaseCollector`
- New util context managers `in_eval_mode` and `in_train_mode` for torch
modules.
- `reset` of `Collectors` now returns `obs` and `info`. 
- `no-grad` no longer accepted as kwarg of `collect`
- Removed deprecations of `0.5.1` (will likely not affect anyone) and
the unused `warnings` module.
2024-05-06 20:38:19 +02:00
Michael Panchenko
e94a5c04cf New context manager: policy_within_training_step
Adjusted notebooks, log messages and docs accordingly. Removed now
obsolete in_eval_mode and the private context manager in Trainer
2024-05-06 19:22:58 +02:00
Michael Panchenko
78ea013956 Tests: fixed test_psrl.py: use args.reward_threshold instead of spec
For some reason now env.spec.reward_treshold is None - some change in upstream code

Also added better pytest skip message
2024-05-06 16:16:20 +02:00
Michael Panchenko
6a5b3c837a Docstrings, skip hidden files in autogen_rst 2024-05-05 23:31:20 +02:00
Michael Panchenko
f059b65103 Merge branch 'refs/heads/thuml-master' into policy-train-eval
# Conflicts:
#	CHANGELOG.md
2024-05-05 22:33:51 +02:00
Michael Panchenko
d8e5631567 Extended changelog, slightly improved structure 2024-05-05 22:28:57 +02:00
Michael Panchenko
2abb4dac24 Reinstated warning module 2024-05-05 22:27:19 +02:00
Dominik Jain
024b80e79c Improve creation of multiple seeded experiments:
* Add class ExperimentCollection to improve usability
  * Remove parameters from ExperimentBuilder.build
  * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
    changing the return type to ExperimentCollection
  * Replace temp_config_mutation (which was not appropriate for the public API) with
    method copy (which performs a safe deep copy)
2024-05-05 22:27:19 +02:00
Dominik Jain
35779696ee Clean up handling of an Experiment's name (and, by extension, a run's name) 2024-05-05 22:27:19 +02:00
Michael Panchenko
a8e9df31f7 Bugfix: allow for training_stat to be None instead of asserting not-None 2024-05-05 22:27:19 +02:00
Michael Panchenko
9fbf28ef6e
Improvements pertaining to the handling of multi-experiment creation (#1131)
Description of changes: see individual commits; merged without squashing.

Co-authored by: @maxhuettenrauch 

Partly addressed #1129
2024-05-05 21:41:53 +02:00
Michael Panchenko
0a7fd1ee8e
Merge branch 'master' into feature/multi-experiment 2024-05-05 16:21:26 +02:00
Michael Panchenko
4e38aeb829 Merge branch 'refs/heads/thuml-master' into policy-train-eval
# Conflicts:
#	CHANGELOG.md
2024-05-05 16:03:34 +02:00
Michael Panchenko
82f425e9fe Collector: move @override, removed docstrings from overridden methods 2024-05-05 16:01:52 +02:00
Michael Panchenko
26a6cca76e Improved docstrings, added asserts to make mypy happy 2024-05-05 15:56:06 +02:00
Michael Panchenko
c5d0e169b5 Collector: removed unnecessary no-grad flag from interfaces. Breaking 2024-05-05 15:41:20 +02:00
Michael Panchenko
f876198870 Formatting 2024-05-05 15:16:53 +02:00
Michael Panchenko
6927eadaa7 BatchPolicy: check that self.is_within_training_step is True on update 2024-05-05 15:14:59 +02:00
dependabot[bot]
2f2d5cb210
Bump tqdm from 4.66.1 to 4.66.3 (#1134) 2024-05-05 15:01:46 +02:00
Dominik Jain
c35be8d07e Establish backward compatibility by implementing __setstate__ 2024-05-03 15:18:39 +02:00
Dominik Jain
ca69e79b4a Change the way in which deterministic evaluation is controlled:
* Remove flag `eval_mode` from Collector.collect
  * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages)
    and set it appropriately in BaseTrainer
2024-05-03 15:18:39 +02:00
Dominik Jain
18f236167f Fix invalid kwarg 2024-05-03 10:12:41 +02:00
Dominik Jain
ca4dad1139 BaseTrainer: Refactoring
New method training_step, which
    * collects training data (method _collect_training_data)
    * performs "test in train" (method _test_in_train)
    * performs policy update
  The old method named train_step performed only the first two points
  and was now split into two separate methods
2024-05-03 10:12:35 +02:00
Dominik Jain
4f16494609 Set torch train mode in BasePolicy.update instead of in each .learn implementation,
as this is less prone to errors
2024-05-02 11:51:08 +02:00
bordeauxred
f31a91df5d
Typo docstring (#1132) 2024-05-01 08:59:00 +02:00
Michael Panchenko
606128f29a
Merge branch 'master' into feature/multi-experiment 2024-04-30 22:52:45 +02:00
Dominik Jain
393e55aa58 Improve change log #1129 2024-04-30 17:47:06 +02:00
Dominik Jain
ea0c4f1a30 Update change log with changes from #1131 2024-04-30 17:31:48 +02:00
Dominik Jain
f8cca8b07c Improve creation of multiple seeded experiments:
* Add class ExperimentCollection to improve usability
  * Remove parameters from ExperimentBuilder.build
  * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection,
    changing the return type to ExperimentCollection
  * Replace temp_config_mutation (which was not appropriate for the public API) with
    method copy (which performs a safe deep copy)
2024-04-30 17:22:11 +02:00
Dominik Jain
2b1594a1c8 Clean up handling of an Experiment's name (and, by extension, a run's name) 2024-04-30 16:24:46 +02:00
bordeauxred
61426acf07
Improve the documentation of compute_episodic_return in base policy. (#1130) 2024-04-30 14:40:16 +02:00
Michael Panchenko
a65920fc68
Support Actor preprocessing network reuse for continuous case, fixes in DQN network (#1128)
This PR fixes a bug in DQN and lifts a limination in reusing the actor's
preprocessing network for continuous environments.

* `atari_network.DQN`:
  * Fix input validation
* Fix output_dim not being set if features_only=True and
output_dim_added_layer not None
* `continuous.Critic`: 
     * Add flag `apply_preprocess_net_to_obs_only` to allow the
preprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
        to reuse the actor's preprocessing network
* CriticFactoryReuseActor: Use the flag, fixing the case where we want
to reuse an actor's
preprocessing network for the critic (must be applied before
concatenating
      the actions)
* Minor improvements in docs/docstrings
2024-04-29 23:49:52 +02:00
Dominik Jain
40f772493e Update change log with changes from #1128 2024-04-29 22:30:54 +02:00
Dominik Jain
83083924df Mention CHANGELOG.md in PR template 2024-04-29 22:14:36 +02:00
Dominik Jain
8ac6bf5fbb Improve docstrings 2024-04-29 18:27:02 +02:00
Dominik Jain
250a129cc4 SamplingConfig: Improve docstrings of replay_buffer_save_only_last_obs, replay_buffer_stack_num 2024-04-29 18:27:02 +02:00
Dominik Jain
74737416ff Fix typo 2024-04-29 18:27:02 +02:00
Dominik Jain
d18ded333e CriticFactoryReuseActor: Fix the case where we want to reuse an actor's
preprocessing network for the critic (must be applied before concatenating
  the actions)
2024-04-29 18:27:02 +02:00
Dominik Jain
0b494845c9 continuous.Critic: Add flag apply_preprocess_net_to_obs_only to allow the
preprocessing network to be applied to the observations only (without
  the actions concatenated), which is essential for the case where we want
  to reuse the actor's preprocessing network
2024-04-29 18:27:02 +02:00
Dominik Jain
18ed981875 Add pickle/serialisation utils: setstate and getstate 2024-04-29 18:27:02 +02:00
Dominik Jain
be1c8cd235 DQN:
* Fix input validation
  * Fix output_dim not being set if features_only=True and output_dim_added_layer not None
2024-04-29 13:37:26 +02:00
Michael Panchenko
a2b9d7c7d8 Changelog [skip-ci] 2024-04-26 18:31:02 +02:00
Michael Panchenko
45922712d9 Dosctring add return [skip-ci] 2024-04-26 18:14:20 +02:00
Michael Panchenko
e2e8a699ea Changelog [skip-ci] 2024-04-26 18:11:23 +02:00
Michael Panchenko
6aa33b1bfe Formatting 2024-04-26 17:54:14 +02:00
Michael Panchenko
c28508b3be Changelog 2024-04-26 17:53:34 +02:00
Michael Panchenko
2eaf1f37c2 Use the new BaseCollector interface for annotations 2024-04-26 17:53:27 +02:00