diff --git a/.readthedocs.yaml b/.readthedocs.yaml
new file mode 100644
index 0000000..6d1ba8b
--- /dev/null
+++ b/.readthedocs.yaml
@@ -0,0 +1,24 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+ os: ubuntu-22.04
+ tools:
+ python: "3.11"
+ jobs:
+ pre_build:
+ - pip install .
+
+# Build documentation in the docs/ directory with Sphinx
+sphinx:
+ configuration: docs/conf.py
+# We recommend specifying your dependencies to enable reproducible builds:
+# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+ install:
+ - requirements: docs/requirements.txt
diff --git a/README.md b/README.md
index 536ddbd..0069512 100644
--- a/README.md
+++ b/README.md
@@ -8,9 +8,7 @@
> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium),
> which is where future maintenance of OpenAI Gym will be taking place.
-> Tianshou has transitioned to internally using Gymnasium environments. You can still use OpenAI Gym environments with
-> Tianshou vector environments, but they will be wrapped in a compatibility layer, which could be a source of issues.
-> We recommend that you update your environment code to Gymnasium. If you want to continue using OpenAI Gym with
+> Tianshou has transitioned to internally using Gymnasium environments. If you want to continue using OpenAI Gym with
> Tianshou, you need to manually install Gym and [Shimmy](https://github.com/Farama-Foundation/Shimmy) (the compatibility layer).
**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:
@@ -69,7 +67,7 @@ In Chinese, Tianshou means divinely ordained and is derived to the gift of being
## Installation
-Tianshou is currently hosted on [PyPI](https://pypi.org/project/tianshou/) and [conda-forge](https://github.com/conda-forge/tianshou-feedstock). It requires Python >= 3.8.
+Tianshou is currently hosted on [PyPI](https://pypi.org/project/tianshou/) and [conda-forge](https://github.com/conda-forge/tianshou-feedstock). It requires Python >= 3.11.
You can simply install Tianshou from PyPI with the following command:
@@ -234,13 +232,21 @@ test_collector = ts.data.Collector(policy, test_envs, exploration_noise=True) #
Let's train it:
```python
-result = ts.trainer.offpolicy_trainer(
- policy, train_collector, test_collector, epoch, step_per_epoch, step_per_collect,
- test_num, batch_size, update_per_step=1 / step_per_collect,
+result = ts.trainer.OffpolicyTrainer(
+ policy=policy,
+ train_collector=train_collector,
+ test_collector=test_collector,
+ max_epoch=epoch,
+ step_per_epoch=step_per_epoch,
+ step_per_collect=step_per_collect,
+ episode_per_test=test_num,
+ batch_size=batch_size,
+ update_per_step=update_per_step=1 / step_per_collect,
train_fn=lambda epoch, env_step: policy.set_eps(eps_train),
test_fn=lambda epoch, env_step: policy.set_eps(eps_test),
stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold,
- logger=logger)
+ logger=logger,
+).run()
print(f'Finished training! Use {result["duration"]}')
```
diff --git a/docs/index.rst b/docs/index.rst
index 7f557fa..758b4d4 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -63,7 +63,7 @@ Here is Tianshou's other features:
Installation
------------
-Tianshou is currently hosted on `PyPI `_ and `conda-forge `_. It requires Python >= 3.8.
+Tianshou is currently hosted on `PyPI `_ and `conda-forge `_. It requires Python >= 3.11.
You can simply install Tianshou from PyPI with the following command:
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 56a21fd..c4cb056 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,7 +1,6 @@
-gym
numba
numpy>=1.20
-sphinx
+sphinx<7
sphinxcontrib-bibtex
sphinx_rtd_theme>=0.5.1
tensorboard
diff --git a/docs/tutorials/dqn.rst b/docs/tutorials/dqn.rst
index b2c5844..87c84d6 100644
--- a/docs/tutorials/dqn.rst
+++ b/docs/tutorials/dqn.rst
@@ -181,19 +181,25 @@ The main function of collector is the collect function, which can be summarized
Train Policy with a Trainer
---------------------------
-Tianshou provides :func:`~tianshou.trainer.onpolicy_trainer`, :func:`~tianshou.trainer.offpolicy_trainer`, and :func:`~tianshou.trainer.offline_trainer`. The trainer will automatically stop training when the policy reach the stop condition ``stop_fn`` on test collector. Since DQN is an off-policy algorithm, we use the :func:`~tianshou.trainer.offpolicy_trainer` as follows:
+Tianshou provides :class:`~tianshou.trainer.OnpolicyTrainer`, :class:`~tianshou.trainer.OffpolicyTrainer`,
+and :class:`~tianshou.trainer.OfflineTrainer`. The trainer will automatically stop training when the policy
+reaches the stop condition ``stop_fn`` on test collector. Since DQN is an off-policy algorithm, we use the
+:class:`~tianshou.trainer.OffpolicyTrainer` as follows:
::
- result = ts.trainer.offpolicy_trainer(
- policy, train_collector, test_collector,
+ result = ts.trainer.OffpolicyTrainer(
+ policy=policy,
+ train_collector=train_collector,
+ test_collector=test_collector,
max_epoch=10, step_per_epoch=10000, step_per_collect=10,
update_per_step=0.1, episode_per_test=100, batch_size=64,
train_fn=lambda epoch, env_step: policy.set_eps(0.1),
test_fn=lambda epoch, env_step: policy.set_eps(0.05),
- stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold)
+ stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold
+ ).run()
print(f'Finished training! Use {result["duration"]}')
-The meaning of each parameter is as follows (full description can be found at :func:`~tianshou.trainer.offpolicy_trainer`):
+The meaning of each parameter is as follows (full description can be found at :class:`~tianshou.trainer.OffpolicyTrainer`):
* ``max_epoch``: The maximum of epochs for training. The training process might be finished before reaching the ``max_epoch``;
* ``step_per_epoch``: The number of environment step (a.k.a. transition) collected per epoch;
diff --git a/examples/inverse/irl_gail.py b/examples/inverse/irl_gail.py
index 606c228..afa4f77 100644
--- a/examples/inverse/irl_gail.py
+++ b/examples/inverse/irl_gail.py
@@ -15,7 +15,7 @@ from torch.optim.lr_scheduler import LambdaLR
from torch.utils.tensorboard import SummaryWriter
from tianshou.data import Batch, Collector, ReplayBuffer, VectorReplayBuffer
-from tianshou.env import SubprocVectorEnv
+from tianshou.env import SubprocVectorEnv, VectorEnvNormObs
from tianshou.policy import GAILPolicy
from tianshou.trainer import OnpolicyTrainer
from tianshou.utils import TensorboardLogger
@@ -97,15 +97,12 @@ def test_gail(args=get_args()):
# train_envs = gym.make(args.task)
train_envs = SubprocVectorEnv(
[lambda: NoRewardEnv(gym.make(args.task)) for _ in range(args.training_num)],
- norm_obs=True,
)
+ train_envs = VectorEnvNormObs(train_envs)
# test_envs = gym.make(args.task)
- test_envs = SubprocVectorEnv(
- [lambda: gym.make(args.task) for _ in range(args.test_num)],
- norm_obs=True,
- obs_rms=train_envs.obs_rms,
- update_obs_rms=False,
- )
+ test_envs = SubprocVectorEnv([lambda: gym.make(args.task) for _ in range(args.test_num)])
+ test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)
+ test_envs.set_obs_rms(train_envs.get_obs_rms())
# seed
np.random.seed(args.seed)