Tianshou

Author	SHA1	Message	Date
Dominik Jain	e0e7349b0a	Add base class BaseActor with method get_preprocess_net for high-level API	2023-10-18 20:44:16 +02:00
Dominik Jain	cd79cf8661	Add A2C high-level API * Add common based class for A2C and PPO agent factories * Add default for dist_fn parameter, adding corresponding factories * Add example mujoco_a2c_hl	2023-10-18 20:44:16 +02:00
Dominik Jain	acd89fa3b0	Remove parameter transformers from config object state, composing the list dynamically instead	2023-10-18 20:44:16 +02:00
Dominik Jain	78b6dd1f49	Adapt class naming scheme * Use prefix convention (subclasses have superclass names as prefix) to facilitate discoverability of relevant classes via IDE autocompletion * Use dual naming, adding an alternative concise name that omits the precise OO semantics and retains only the essential part of the name (which can be more pleasing to users not accustomed to convoluted OO naming)	2023-10-18 20:44:16 +02:00
Michael Panchenko	5bcf514c55	Add alternative functional interface for environment creation where a persistable configuration object is passed as an argument, as this can help to ensure persistability (making the requirement explicit)	2023-10-18 20:44:16 +02:00
Dominik Jain	d4e604b46e	Move parameter transformation directly into parameter objects, achieving greater separation of concerns and improved maintainability	2023-10-18 20:44:16 +02:00
Dominik Jain	38cf982034	Disable Ruff rule D205 (blank-line-after-summary) because it disallows, in particular, class docstrings that consist only of a summary line	2023-10-18 20:44:16 +02:00
Dominik Jain	e993425aa1	Add high-level API support for TD3 * Created mixins for agent factories to reduce code duplication * Further factorised params & mixins for experiment factories * Additional parameter abstractions * Implement high-level MuJoCo TD3 example	2023-10-18 20:44:16 +02:00
Dominik Jain	6a739384ee	WandbLogger: Use less restrictive type annotation for config	2023-10-18 20:44:16 +02:00
Dominik Jain	367778d37f	Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way	2023-10-18 20:44:16 +02:00
Dominik Jain	37dc07e487	Add high-level experiment builder interface	2023-10-18 20:44:05 +02:00
Dominik Jain	4d53d345d6	Ignore Ruff rule RET505, because it sacrifices visual discernability of control flow paths for brevity (regarding return statements)	2023-10-09 13:03:19 +02:00
Dominik Jain	3fd60f9e70	Unify PPO configuration objects, use experiment-specific configuration in mujoco_ppo_hl	2023-10-09 13:02:29 +02:00
Dominik Jain	8ec42009cb	Move RLSamplingConfig to separate module config, fixing cyclic import	2023-10-09 13:02:23 +02:00
Dominik Jain	d26b8cb40c	Use experiment-specific config in mujoco_sac_hl, adding auto-alpha	2023-10-09 13:02:18 +02:00
Dominik Jain	adc324038a	Remove LoggerConfig	2023-10-09 13:02:13 +02:00
Dominik Jain	997b520580	Refactoring, dropping package config	2023-10-09 13:02:07 +02:00
Dominik Jain	316eb3c579	Add SAC high-level interface	2023-10-09 13:02:01 +02:00
Dominik Jain	2a1cc6bb55	Enable ruff setting ignore-init-module-imports	2023-10-09 13:01:53 +02:00
Dominik Jain	25c6bbd38c	Ignore D106: Missing docstring in public nested class	2023-10-09 13:01:44 +02:00
Dominik Jain	16ed5fd2a5	Initial high-level interfaces, demonstrated in mujoco_ppo_hl	2023-10-09 13:01:35 +02:00
Michael Panchenko	a54aade730	Addition of dataclasses based config for scripts, major refactoring So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py	2023-10-09 13:01:27 +02:00
Dominik Jain	42fc181d74	Add dev dependencies jsonargparse and docstring_parser	2023-10-09 13:01:11 +02:00
Michael Panchenko	b900fdf6f2	Remove kwargs in policy init (#950 ) Closes #947 This removes all kwargs from all policy constructors. While doing that, I also improved several names and added a whole lot of TODOs. ## Functional changes: 1. Added possibility to pass None as `critic2` and `critic2_optim`. In fact, the default behavior then should cover the absolute majority of cases 2. Added a function called `clone_optimizer` as a temporary measure to support passing `critic2_optim=None` ## Breaking changes: 1. `action_space` is no longer optional. In fact, it already was non-optional, as there was a ValueError in BasePolicy.init. So now several examples were fixed to reflect that 2. `reward_normalization` removed from DDPG and children. It was never allowed to pass it as `True` there, an error would have been raised in `compute_n_step_reward`. Now I removed it from the interface 3. renamed `critic1` and similar to `critic`, in order to have uniform interfaces. Note that the `critic` in DDPG was optional for the sole reason that child classes used `critic1`. I removed this optionality (DDPG can't do anything with `critic=None`) 4. Several renamings of fields (mostly private to public, so backwards compatible) ## Additional changes: 1. Removed type and default declaration from docstring. This kind of duplication is really not necessary 2. Policy constructors are now only called using named arguments, not a fragile mixture of positional and named as before 5. Minor beautifications in typing and code 6. Generally shortened docstrings and made them uniform across all policies (hopefully) ## Comment: With these changes, several problems in tianshou's inheritance hierarchy become more apparent. I tried highlighting them for future work. --------- Co-authored-by: Dominik Jain <d.jain@appliedai.de>	2023-10-08 08:57:03 -07:00
dependabot[bot]	bc7ec9c149	Bump pillow from 10.0.0 to 10.0.1 (#958 ) Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.0.0 to 10.0.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/python-pillow/Pillow/releases">pillow's releases</a>.</em></p> <blockquote> <h2>10.0.1</h2> <p><a href="https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html">https://pillow.readthedocs.io/en/stable/releasenotes/10.0.1.html</a></p> <h2>Changes</h2> <ul> <li>Updated libwebp to 1.3.2 <a href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Updated zlib to 1.3 <a href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst">pillow's changelog</a>.</em></p> <blockquote> <h2>10.0.1 (2023-09-15)</h2> <ul> <li> <p>Updated libwebp to 1.3.2 <a href="https://redirect.github.com/python-pillow/Pillow/issues/7395">#7395</a> [radarhere]</p> </li> <li> <p>Updated zlib to 1.3 <a href="https://redirect.github.com/python-pillow/Pillow/issues/7344">#7344</a> [radarhere]</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`e34d346f10`"><code>e34d346</code></a> Updated order</li> <li><a href="`a62f2402a6`"><code>a62f240</code></a> 10.0.1 version bump</li> <li><a href="`d50250d9ea`"><code>d50250d</code></a> Added release notes for 10.0.1</li> <li><a href="`b4c7d4b8b2`"><code>b4c7d4b</code></a> Update CHANGES.rst [ci skip]</li> <li><a href="`730f74600e`"><code>730f746</code></a> Updated libwebp to 1.3.2</li> <li><a href="`b0e28048d6`"><code>b0e2804</code></a> Updated zlib to 1.3</li> <li>See full diff in <a href="https://github.com/python-pillow/Pillow/compare/10.0.0...10.0.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=10.0.0&new-version=10.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/thu-ml/tianshou/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 20:31:57 -07:00
dependabot[bot]	b24f270a74	Bump urllib3 from 1.26.16 to 1.26.17 (#957 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.16 to 1.26.17. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>1.26.17</h2> <ul> <li>Added the <code>Cookie</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>. (GHSA-v845-jxx5-vc9f)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>1.26.17 (2023-10-02)</h1> <ul> <li>Added the <code>Cookie</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>. (<code>[#3139](https://github.com/urllib3/urllib3/issues/3139) <https://github.com/urllib3/urllib3/pull/3139></code>_)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c9016bf464`"><code>c9016bf</code></a> Release 1.26.17</li> <li><a href="`01220354d3`"><code>0122035</code></a> Backport GHSA-v845-jxx5-vc9f (<a href="https://redirect.github.com/urllib3/urllib3/issues/3139">#3139</a>)</li> <li><a href="`e63989f97d`"><code>e63989f</code></a> Fix installing <code>brotli</code> extra on Python 2.7</li> <li><a href="`2e7a24d087`"><code>2e7a24d</code></a> [1.26] Configure OS for RTD to fix building docs</li> <li><a href="`57181d6ea9`"><code>57181d6</code></a> [1.26] Improve error message when calling urllib3.request() (<a href="https://redirect.github.com/urllib3/urllib3/issues/3058">#3058</a>)</li> <li><a href="`3c0148048a`"><code>3c01480</code></a> [1.26] Run coverage even with failed jobs</li> <li>See full diff in <a href="https://github.com/urllib3/urllib3/compare/1.26.16...1.26.17">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.16&new-version=1.26.17)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/thu-ml/tianshou/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 23:58:26 +00:00
dependabot[bot]	d11a5a3d99	Bump gitpython from 3.1.33 to 3.1.35 (#953 ) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.33 to 3.1.35. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/gitpython-developers/GitPython/releases">gitpython's releases</a>.</em></p> <blockquote> <h2>3.1.35 - a fix for CVE-2023-41040</h2> <h2>What's Changed</h2> <ul> <li>Bump actions/checkout from 3 to 4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1643">gitpython-developers/GitPython#1643</a></li> <li>Fix 'Tree' object has no attribute '_name' when submodule path is normal path by <a href="https://github.com/CosmosAtlas"><code>@CosmosAtlas</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1645">gitpython-developers/GitPython#1645</a></li> <li>Fix CVE-2023-41040 by <a href="https://github.com/facutuesca"><code>@facutuesca</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1644">gitpython-developers/GitPython#1644</a></li> <li>Only make config more permissive in tests that need it by <a href="https://github.com/EliahKagan"><code>@EliahKagan</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1648">gitpython-developers/GitPython#1648</a></li> <li>Added test for PR <a href="https://redirect.github.com/gitpython-developers/GitPython/issues/1645">#1645</a> submodule path by <a href="https://github.com/CosmosAtlas"><code>@CosmosAtlas</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1647">gitpython-developers/GitPython#1647</a></li> <li>Fix Windows environment variable upcasing bug by <a href="https://github.com/EliahKagan"><code>@EliahKagan</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1650">gitpython-developers/GitPython#1650</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/CosmosAtlas"><code>@CosmosAtlas</code></a> made their first contribution in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1645">gitpython-developers/GitPython#1645</a></li> <li><a href="https://github.com/facutuesca"><code>@facutuesca</code></a> made their first contribution in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1644">gitpython-developers/GitPython#1644</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/gitpython-developers/GitPython/compare/3.1.34...3.1.35">https://github.com/gitpython-developers/GitPython/compare/3.1.34...3.1.35</a></p> <h2>3.1.34 - fix resource leaking</h2> <h2>What's Changed</h2> <ul> <li>util: close lockfile after opening successfully by <a href="https://github.com/skshetry"><code>@skshetry</code></a> in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1639">gitpython-developers/GitPython#1639</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/skshetry"><code>@skshetry</code></a> made their first contribution in <a href="https://redirect.github.com/gitpython-developers/GitPython/pull/1639">gitpython-developers/GitPython#1639</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.34">https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.34</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c8e303ffd3`"><code>c8e303f</code></a> prepare next release</li> <li><a href="`09e1b3dbae`"><code>09e1b3d</code></a> Merge pull request <a href="https://redirect.github.com/gitpython-developers/GitPython/issues/1650">#1650</a> from EliahKagan/envcase</li> <li><a href="`8017421ade`"><code>8017421</code></a> Merge pull request <a href="https://redirect.github.com/gitpython-developers/GitPython/issues/1647">#1647</a> from CosmosAtlas/master</li> <li><a href="`fafb4f6651`"><code>fafb4f6</code></a> updated docs to better describe testing procedure with new repo</li> <li><a href="`9da24d46c6`"><code>9da24d4</code></a> add test for submodule path not owned by submodule case</li> <li><a href="`eebdb25ee6`"><code>eebdb25</code></a> Eliminate duplication of git.util.cwd logic</li> <li><a href="`c7fad20be5`"><code>c7fad20</code></a> Fix Windows env var upcasing regression</li> <li><a href="`7296e5c021`"><code>7296e5c</code></a> Make test helper script a file, for readability</li> <li><a href="`d88372a11a`"><code>d88372a</code></a> Add test for Windows env var upcasing regression</li> <li><a href="`11839ab5ce`"><code>11839ab</code></a> Merge pull request <a href="https://redirect.github.com/gitpython-developers/GitPython/issues/1648">#1648</a> from EliahKagan/file-protocol</li> <li>Additional commits viewable in <a href="https://github.com/gitpython-developers/GitPython/compare/3.1.33...3.1.35">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gitpython&package-manager=pip&previous-version=3.1.33&new-version=3.1.35)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/thu-ml/tianshou/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 07:52:57 +00:00
Anas BELFADIL	c30b4abb8f	Add calibration to CQL as in CalQL paper arXiv:2303.05479 (#915 ) - [X] I have marked all applicable categories: + [ ] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [X] new feature - [X] I have reformatted the code using `make format` (required) - [X] I have checked the code using `make commit-checks` (required) - [X] If applicable, I have mentioned the relevant/related issue(s) - [X] If applicable, I have listed every items in this Pull Request below	2023-10-02 22:54:34 -07:00
Jiayi Weng	6449a43261	Fix documentation build (#951 ) Close #941 rtfd build link: https://readthedocs.org/projects/tianshou/builds/22019877/ Also -- fix two small issues reported by users, see #928 and #930 Note: I created the branch in thu-ml:tianshou instead of Trinkle23897:tianshou to quickly check the rtfd build. It's not a good process since every commit would trigger twice CI pipelines :(	2023-09-26 08:24:08 -07:00
Michael Panchenko	c8e7d02cba	Minor: use Self type where appropriate (#942 ) Small typing improvement, related to https://github.com/thu-ml/tianshou/pull/915#discussion_r1329734222	2023-09-19 15:40:32 -07:00
Michael Panchenko	2cc34fb72b	Poetry install, remove gym, bump python (#925 ) Closes #914 Additional changes: - Deprecate python below 11 - Remove 3rd party and throughput tests. This simplifies install and test pipeline - Remove gym compatibility and shimmy - Format with 3.11 conventions. In particular, add `zip(..., strict=True/False)` where possible Since the additional tests and gym were complicating the CI pipeline (flaky and dist-dependent), it didn't make sense to work on fixing the current tests in this PR to then just delete them in the next one. So this PR changes the build and removes these tests at the same time.	2023-09-05 14:34:23 -07:00
Michael Panchenko	600f4bbd55	Python 3.9, black + ruff formatting (#921 ) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com>	2023-08-25 14:40:56 -07:00
Michael Panchenko	07702fc007	Improved typing and reduced duplication (#912 ) # Goals of the PR The PR introduces no changes to functionality, apart from improved input validation here and there. The main goals are to reduce some complexity of the code, to improve types and IDE completions, and to extend documentation and block comments where appropriate. Because of the change to the trainer interfaces, many files are affected (more details below), but still the overall changes are "small" in a certain sense. ## Major Change 1 - BatchProtocol TL;DR: One can now annotate which fields the batch is expected to have on input params and which fields a returned batch has. Should be useful for reading the code. getting meaningful IDE support, and catching bugs with mypy. This annotation strategy will continue to work if Batch is replaced by TensorDict or by something else. In more detail: Batch itself has no fields and using it for annotations is of limited informational power. Batches with fields are not separate classes but instead instances of Batch directly, so there is no type that could be used for annotation. Fortunately, python `Protocol` is here for the rescue. With these changes we can now do things like ```python class ActionBatchProtocol(BatchProtocol): logits: Sequence[Union[tuple, torch.Tensor]] dist: torch.distributions.Distribution act: torch.Tensor state: Optional[torch.Tensor] class RolloutBatchProtocol(BatchProtocol): obs: torch.Tensor obs_next: torch.Tensor info: Dict[str, Any] rew: torch.Tensor terminated: torch.Tensor truncated: torch.Tensor class PGPolicy(BasePolicy): ... def forward( self, batch: RolloutBatchProtocol, state: Optional[Union[dict, Batch, np.ndarray]] = None, kwargs: Any, ) -> ActionBatchProtocol: ``` The IDE and mypy are now very helpful in finding errors and in auto-completion, whereas before the tools couldn't assist in that at all. ## Major Change 2 - remove duplication in trainer package TL;DR: There was a lot of duplication between `BaseTrainer` and its subclasses. Even worse, it was almost-duplication. There was also interface fragmentation through things like `onpolicy_trainer`. Now this duplication is gone and all downstream code was adjusted. In more detail:** Since this change affects a lot of code, I would like to explain why I thought it to be necessary. 1. The subclasses of `BaseTrainer` just duplicated docstrings and constructors. What's worse, they changed the order of args there, even turning some kwargs of BaseTrainer into args. They also had the arg `learning_type` which was passed as kwarg to the base class and was unused there. This made things difficult to maintain, and in fact some errors were already present in the duplicated docstrings. 2. The "functions" a la `onpolicy_trainer`, which just called the `OnpolicyTrainer.run`, not only introduced interface fragmentation but also completely obfuscated the docstring and interfaces. They themselves had no dosctring and the interface was just `args, *kwargs`, which makes it impossible to understand what they do and which things can be passed without reading their implementation, then reading the docstring of the associated class, etc. Needless to say, mypy and IDEs provide no support with such functions. Nevertheless, they were used everywhere in the code-base. I didn't find the sacrifices in clarity and complexity justified just for the sake of not having to write `.run()` after instantiating a trainer. 3. The trainers are all very similar to each other. As for my application I needed a new trainer, I wanted to understand their structure. The similarity, however, was hard to discover since they were all in separate modules and there was so much duplication. I kept staring at the constructors for a while until I figured out that essentially no changes to the superclass were introduced. Now they are all in the same module and the similarities/differences between them are much easier to grasp (in my opinion) 4. Because of (1), I had to manually change and check a lot of code, which was very tedious and boring. This kind of work won't be necessary in the future, since now IDEs can be used for changing signatures, renaming args and kwargs, changing class names and so on. I have some more reasons, but maybe the above ones are convincing enough. ## Minor changes: improved input validation and types I added input validation for things like `state` and `action_scaling` (which only makes sense for continuous envs). After adding this, some tests failed to pass this validation. There I added `action_scaling=isinstance(env.action_space, Box)`, after which tests were green. I don't know why the tests were green before, since action scaling doesn't make sense for discrete actions. I guess some aspect was not tested and didn't crash. I also added Literal in some places, in particular for `action_bound_method`. Now it is no longer allowed to pass an empty string, instead one should pass `None`. Also here there is input validation with clear error messages. @Trinkle23897 The functional tests are green. I didn't want to fix the formatting, since it will change in the next PR that will solve #914 anyway. I also found a whole bunch of code in `docs/_static`, which I just deleted (shouldn't it be copied from the sources during docs build instead of committed?). I also haven't adjusted the documentation yet, which atm still mentions the trainers of the type `onpolicy_trainer(...)` instead of `OnpolicyTrainer(...).run()` ## Breaking Changes The adjustments to the trainer package introduce breaking changes as duplicated interfaces are deleted. However, it should be very easy for users to adjust to them --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>	2023-08-22 09:54:46 -07:00
Anas BELFADIL	80a698be52	Custom keys support in ReplayBuffer (#903 ) Issue: Custom keys support in ReplayBuffer #902 Modified `ReplayBuffer` `add` and `__getitem__` methods. Added `test_custom_key()` to test_buffer.py	2023-08-10 16:06:10 -07:00
Jiayi Weng	61182450b6	add py.typed, drop 3.6/3.7, support 3.11 (#910 ) closing #892 #901	2023-08-10 14:13:46 -07:00
Błażej Osiński	864ee3df2f	Make monitor_gym configurable in WandbLogger. (#896 ) At the moment, WandbLogger is always using wandb.init with monitor_gym = True. This fails when OpenAI's gym is not installed, which doesn't make sense after the transition to Gymnasium. I am using Tianshou with non-standard RL environment, which adhere to Gymnasium API, and the current code is throwing exceptions. I suggest to make it a controllable parameter. I left the default value to True (to make it functionally the same for people using gym). It may also make sense to change the default to False.	2023-08-09 15:13:25 -07:00
Błażej Osiński	cd218dc12d	Add assert description. (#894 ) The assert was missing a description, I fixed it. Please note: there is an error in the documentations, but it does not seem to be related to my changes.	2023-08-09 15:12:42 -07:00
Anas BELFADIL	cb8551f315	Fix master branch test issues (#908 )	2023-08-09 10:27:18 -07:00
Zhenjie Zhao	f8808d236f	fix a problem of the atari dqn example (#861 )	2023-04-30 08:44:27 -07:00
Gen	7ce62a6ad4	actor critic share head bug for example code without sharing head - unify code style (#860 )	2023-04-28 21:43:22 -07:00
ChenDRAG	1423eeb3b2	Add warnings for duplicate usage of action-bounded actor and action scaling method (#850 ) - Fix the current bug discussed in #844 in `test_ppo.py`. - Add warning for `ActorProb ` if both `max_action ` and `unbounded=True` are used for model initializations. - Add warning for PGpolicy and DDPGpolicy if they find duplicate usage of action-bounded actor and action scaling method.	2023-04-23 16:03:31 -07:00
wckwan	e7c2c3711e	Update gail.py (#849 ) Remove repeated description of lr_scheduler in the doc string.	2023-04-13 07:25:57 -07:00
Quoding	4ac407c78f	Remove test_fn and train_fn as they are not used in PPO PistonBall example for PettingZoo (#840 ) Specifically, BasePolicy.set_eps seems to be a remnant from using DQN in other examples. * Removed unused functions (test_fn and train_fn) from the pettingzoo example with PistonBall. These functions use set_eps which is not available for PPO and is not even called once in the file.	2023-03-31 10:43:21 -07:00
Jiayi Weng	7f8fa241dd	making pettingzoo a core dep instead of optional req (#837 ) close #831	2023-03-25 22:01:09 -07:00
Jiayi Weng	d5d521b329	fix conda installation command (#830 ) close #828	2023-03-19 17:40:47 -07:00
Jiayi Weng	efdf72cb31	fix sphinx itemlist render error	2023-03-12 22:27:39 -07:00
Jiayi Weng	f0afdeaf6a	update version to 0.5.0 (#826 ) v0.5.0	2023-03-12 22:07:16 -07:00
Oren Zeev-Ben-Mordehai	73600edc58	fix a bug in batch._is_batch_set (#825 ) - [ ] I have marked all applicable categories: + [x] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [ ] I have reformatted the code using `make format` (required) - [ ] I have checked the code using `make commit-checks` (required) - [ ] If applicable, I have mentioned the relevant/related issue(s) - [ ] If applicable, I have listed every items in this Pull Request below I'm developing a new PettingZoo environment. It is a two players turns board game. ``` obs_space = dict( board = gym.spaces.MultiBinary([8, 8]), player = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2), other_player = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2) ) self._observation_space = gym.spaces.Dict(spaces=obs_space) self._action_space = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2) ... # this cache ensures that same space object is returned for the same agent # allows action space seeding to work as expected @functools.lru_cache(maxsize=None) def observation_space(self, agent): # gymnasium spaces are defined and documented here: https://gymnasium.farama.org/api/spaces/ return self._observation_space @functools.lru_cache(maxsize=None) def action_space(self, agent): return self._action_space ``` My test is: ``` def test_with_tianshou(): action = None # env = gym.make('qwertyenv/CollectCoins-v0', pieces=['rock', 'rock']) env = CollectCoinsEnv(pieces=['rock', 'rock'], with_mask=True) def another_action_taken(action_taken): nonlocal action action = action_taken # Wrapping the original environment as to make sure a valid action will be taken. env = EnsureValidAction( env, env.check_action_valid, env.provide_alternative_valid_action, another_action_taken ) env = PettingZooEnv(env) policies = MultiAgentPolicyManager([RandomPolicy(), RandomPolicy()], env) env = DummyVectorEnv([lambda: env]) collector = Collector(policies, env) result = collector.collect(n_step=200, render=0.1) ``` I have also a wrapper that may be redundant as of Tianshou capability to action_mask, yet it is still part of the code: ``` from typing import TypeVar, Callable import gymnasium as gym from pettingzoo.utils.wrappers import BaseWrapper Action = TypeVar("Action") class ActionWrapper(BaseWrapper): def __init__(self, env: gym.Env): super().__init__(env) def step(self, action): action = self.action(action) self.env.step(action) def action(self, action): pass def render(self, args, kwargs): self.env.render(args, **kwargs) class EnsureValidAction(ActionWrapper): """ A gym environment wrapper to help with the case that the agent wants to take invalid actions. For example consider a Chess game, where you let the action_space be any piece moving to any square on the board, but then when a wrong move is taken, instead of returing a big negative reward, you just take another action, this time a valid one. To make sure the learning algorithm is aware of the action taken, a callback should be provided. """ def __init__(self, env: gym.Env, check_action_valid: Callable[[Action], bool], provide_alternative_valid_action: Callable[[Action], Action], alternative_action_cb: Callable[[Action], None]): super().__init__(env) self.check_action_valid = check_action_valid self.provide_alternative_valid_action = provide_alternative_valid_action self.alternative_action_cb = alternative_action_cb def action(self, action: Action) -> Action: if self.check_action_valid(action): return action alternative_action = self.provide_alternative_valid_action(action) self.alternative_action_cb(alternative_action) return alternative_action ``` To make above work I had to patch a bit PettingZoo (opened a pull-request there), and a small patch here (this PR). Maybe I'm doing something wrong, yet I fail to see it. With my both fixes of PZ and of Tianshou, I have two tests, one of the environment by itself, and the other as of above.	2023-03-12 17:58:09 -07:00
sunkafei	bc222e87a6	Fix #811 (#817 )	2023-03-03 16:57:04 -08:00
Jiayi Weng	c8be85b240	fix readthedocs build error	2023-02-03 14:55:53 -08:00

1 2 3 4 5 ...

418 Commits