Merge remote-tracking branch 'origin/master' into feat/high-level-api

Conflicts: poetry.lock
2023-10-18 20:44:45 +02:00 · 2023-10-18 20:44:45 +02:00 · 9c5ee55644
commit 9c5ee55644
parent 41bd463a7b 4a51e69265
4 changed files with 933 additions and 853 deletions
--- a/README.md
+++ b/README.md
@ -6,12 +6,24 @@

 [![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/) [![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock) [![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master) [![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/master/) [![Unittest](https://github.com/thu-ml/tianshou/actions/workflows/pytest.yml/badge.svg)](https://github.com/thu-ml/tianshou/actions) [![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou) [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues) [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network) [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)

-> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium), 
-> which is where future maintenance of OpenAI Gym will be taking place. 
-> Tianshou has transitioned to internally using Gymnasium environments. If you want to continue using OpenAI Gym with
-> Tianshou, you need to manually install Gym and [Shimmy](https://github.com/Farama-Foundation/Shimmy) (the compatibility layer).
+> ⚠️️ **Dropped support of Gym**:
+> Tianshou no longer supports `gym`, and we recommend that you transition to 
+> [Gymnasium](http://github.com/Farama-Foundation/Gymnasium).
+> If you absolutely have to use gym, you can try using [Shimmy](https://github.com/Farama-Foundation/Shimmy) 
+> (the compatibility layer), but tianshou provides no guarantees that things will work then.

-**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:
+> ⚠️️ **Current Status**: the tianshou master branch is currently under heavy development,
+> moving towards more features, improved interfaces, more documentation, and better compatibility with
+> other RL libraries. You can view the relevant issues in the corresponding 
+> [milestone](https://github.com/thu-ml/tianshou/milestone/1)
+> Stay tuned! (and expect breaking changes until the release is done)
+
+> ⚠️️ **Installing PyTorch**: Because of a problem with pytorch packaging and poetry in
+> current releases, the newest version of pytorch is not included in the tianshou dependencies.
+> You can still install the newest pytorch with `pip` after tianshou was installed with `poetry`.
+> [Here](https://github.com/python-poetry/poetry/issues/7902#issuecomment-1747400255) is a discussion between torch and poetry devs, who are trying to resolve it.
+
+**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike several existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:

 - [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
 - [Double DQN](https://arxiv.org/pdf/1509.06461.pdf)
@ -48,7 +60,7 @@

 Here are Tianshou's other features:

- Elegant framework, using only ~4000 lines of code
+- Elegant framework, using few lines of code in the core abstractions
 - State-of-the-art [MuJoCo benchmark](https://github.com/thu-ml/tianshou/tree/master/examples/mujoco) for REINFORCE/A2C/TRPO/PPO/DDPG/TD3/SAC algorithms
 - Support vectorized environment (synchronous or asynchronous) for all algorithms [Usage](https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#parallel-sampling)
 - Support super-fast vectorized environment [EnvPool](https://github.com/sail-sg/envpool/) for all algorithms [Usage](https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#envpool-integration)
@ -158,14 +170,15 @@ The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/ma

 Tianshou has its tests. Different from other platforms, **the tests include the full agent training procedure for all of the implemented algorithms**. It would be failed once if it could not train an agent to perform well enough on limited epochs on toy scenarios. The tests secure the reproducibility of our platform. Check out the [GitHub Actions](https://github.com/thu-ml/tianshou/actions) page for more detail.

-The Atari/Mujoco benchmark results are under [examples/atari/](examples/atari/) and [examples/mujoco/](examples/mujoco/) folders. **Our Mujoco result can beat most of existing benchmark.**
+The Atari/Mujoco benchmark results are under [examples/atari/](examples/atari/) and [examples/mujoco/](examples/mujoco/) folders. **Our Mujoco result can beat most of existing benchmarks.**

 ### Modularized Policy

-We decouple all of the algorithms roughly into the following parts:
+We decouple all algorithms roughly into the following parts:

 - `__init__`: initialize the policy;
 - `forward`: to compute actions over given observations;
+- `process_buffer`: process initial buffer, useful for some offline learning algorithms
 - `process_fn`: to preprocess data from replay buffer (since we have reformulated all algorithms to replay-buffer based algorithms);
 - `learn`: to learn from a given batch data;
 - `post_process_fn`: to update the replay buffer from the learning process (e.g., prioritized replay buffer needs to update the weight);
@ -314,6 +327,9 @@ If you find Tianshou useful, please cite it in your publications.

 ## Acknowledgment

+Tianshou is supported by [appliedAI Institute for Europe](https://www.appliedai-institute.de/en/),
+who is committed to providing long-term support and development.
+
 Tianshou was previously a reinforcement learning platform based on TensorFlow. You can check out the branch [`priv`](https://github.com/thu-ml/tianshou/tree/priv) for more detail. Many thanks to [Haosheng Zou](https://github.com/HaoshengZou)'s pioneering work for Tianshou before version 0.1.1.

 We would like to thank [TSAIL](http://ml.cs.tsinghua.edu.cn/) and [Institute for Artificial Intelligence, Tsinghua University](http://ml.cs.tsinghua.edu.cn/thuai/) for providing such an excellent AI research platform.
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -27,7 +27,7 @@ exclude = ["test/*", "examples/*", "docs/*"]

 [tool.poetry.dependencies]
 python = "^3.11"
-gymnasium = "^0.29.0"
+gymnasium = "^0.28.0"
 h5py = "^3.9.0"
 numba = "^0.57.1"
 numpy = "^1"
@ -36,7 +36,7 @@ packaging = "*"
 pettingzoo = "^1.22"
 tensorboard = "^2.5.0"
 # Torch 2.0.1 causes problems, see https://github.com/pytorch/pytorch/issues/100974
-torch = "^2.0.0, !=2.0.1"
+torch = "^2.0.0, !=2.0.1, !=2.1.0"
 tqdm = "*"
 virtualenv = [
 # special sauce b/c of a flaky bug in poetry on windows
--- a/tianshou/policy/multiagent/mapolicy.py
+++ b/tianshou/policy/multiagent/mapolicy.py
@ -1,4 +1,4 @@
-from typing import Any, Literal
+from typing import Any, Literal, Self

 import numpy as np

@ -230,3 +230,11 @@ class MultiAgentPolicyManager(BasePolicy):
                for k, v in out.items():
                    results[agent_id + "/" + k] = v
        return results
+
+    # Need a train method that set all sub-policies to train mode.
+    # No need for a similar eval function, as eval internally uses the train function.
+    def train(self, mode: bool = True) -> Self:
+        """Set each internal policy in training mode."""
+        for policy in self.policies.values():
+            policy.train(mode)
+        return self