Fix documentation build (#951)
Close #941 rtfd build link: https://readthedocs.org/projects/tianshou/builds/22019877/ Also -- fix two small issues reported by users, see #928 and #930 Note: I created the branch in thu-ml:tianshou instead of Trinkle23897:tianshou to quickly check the rtfd build. It's not a good process since every commit would trigger twice CI pipelines :(
This commit is contained in:
parent
c8e7d02cba
commit
6449a43261
24
.readthedocs.yaml
Normal file
24
.readthedocs.yaml
Normal file
@ -0,0 +1,24 @@
|
||||
# .readthedocs.yaml
|
||||
# Read the Docs configuration file
|
||||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
||||
|
||||
# Required
|
||||
version: 2
|
||||
|
||||
# Set the version of Python and other tools you might need
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.11"
|
||||
jobs:
|
||||
pre_build:
|
||||
- pip install .
|
||||
|
||||
# Build documentation in the docs/ directory with Sphinx
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
# We recommend specifying your dependencies to enable reproducible builds:
|
||||
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/requirements.txt
|
22
README.md
22
README.md
@ -8,9 +8,7 @@
|
||||
|
||||
> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium),
|
||||
> which is where future maintenance of OpenAI Gym will be taking place.
|
||||
> Tianshou has transitioned to internally using Gymnasium environments. You can still use OpenAI Gym environments with
|
||||
> Tianshou vector environments, but they will be wrapped in a compatibility layer, which could be a source of issues.
|
||||
> We recommend that you update your environment code to Gymnasium. If you want to continue using OpenAI Gym with
|
||||
> Tianshou has transitioned to internally using Gymnasium environments. If you want to continue using OpenAI Gym with
|
||||
> Tianshou, you need to manually install Gym and [Shimmy](https://github.com/Farama-Foundation/Shimmy) (the compatibility layer).
|
||||
|
||||
**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:
|
||||
@ -69,7 +67,7 @@ In Chinese, Tianshou means divinely ordained and is derived to the gift of being
|
||||
|
||||
## Installation
|
||||
|
||||
Tianshou is currently hosted on [PyPI](https://pypi.org/project/tianshou/) and [conda-forge](https://github.com/conda-forge/tianshou-feedstock). It requires Python >= 3.8.
|
||||
Tianshou is currently hosted on [PyPI](https://pypi.org/project/tianshou/) and [conda-forge](https://github.com/conda-forge/tianshou-feedstock). It requires Python >= 3.11.
|
||||
|
||||
You can simply install Tianshou from PyPI with the following command:
|
||||
|
||||
@ -234,13 +232,21 @@ test_collector = ts.data.Collector(policy, test_envs, exploration_noise=True) #
|
||||
Let's train it:
|
||||
|
||||
```python
|
||||
result = ts.trainer.offpolicy_trainer(
|
||||
policy, train_collector, test_collector, epoch, step_per_epoch, step_per_collect,
|
||||
test_num, batch_size, update_per_step=1 / step_per_collect,
|
||||
result = ts.trainer.OffpolicyTrainer(
|
||||
policy=policy,
|
||||
train_collector=train_collector,
|
||||
test_collector=test_collector,
|
||||
max_epoch=epoch,
|
||||
step_per_epoch=step_per_epoch,
|
||||
step_per_collect=step_per_collect,
|
||||
episode_per_test=test_num,
|
||||
batch_size=batch_size,
|
||||
update_per_step=update_per_step=1 / step_per_collect,
|
||||
train_fn=lambda epoch, env_step: policy.set_eps(eps_train),
|
||||
test_fn=lambda epoch, env_step: policy.set_eps(eps_test),
|
||||
stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold,
|
||||
logger=logger)
|
||||
logger=logger,
|
||||
).run()
|
||||
print(f'Finished training! Use {result["duration"]}')
|
||||
```
|
||||
|
||||
|
@ -63,7 +63,7 @@ Here is Tianshou's other features:
|
||||
Installation
|
||||
------------
|
||||
|
||||
Tianshou is currently hosted on `PyPI <https://pypi.org/project/tianshou/>`_ and `conda-forge <https://github.com/conda-forge/tianshou-feedstock>`_. It requires Python >= 3.8.
|
||||
Tianshou is currently hosted on `PyPI <https://pypi.org/project/tianshou/>`_ and `conda-forge <https://github.com/conda-forge/tianshou-feedstock>`_. It requires Python >= 3.11.
|
||||
|
||||
You can simply install Tianshou from PyPI with the following command:
|
||||
|
||||
|
@ -1,7 +1,6 @@
|
||||
gym
|
||||
numba
|
||||
numpy>=1.20
|
||||
sphinx
|
||||
sphinx<7
|
||||
sphinxcontrib-bibtex
|
||||
sphinx_rtd_theme>=0.5.1
|
||||
tensorboard
|
||||
|
@ -181,19 +181,25 @@ The main function of collector is the collect function, which can be summarized
|
||||
Train Policy with a Trainer
|
||||
---------------------------
|
||||
|
||||
Tianshou provides :func:`~tianshou.trainer.onpolicy_trainer`, :func:`~tianshou.trainer.offpolicy_trainer`, and :func:`~tianshou.trainer.offline_trainer`. The trainer will automatically stop training when the policy reach the stop condition ``stop_fn`` on test collector. Since DQN is an off-policy algorithm, we use the :func:`~tianshou.trainer.offpolicy_trainer` as follows:
|
||||
Tianshou provides :class:`~tianshou.trainer.OnpolicyTrainer`, :class:`~tianshou.trainer.OffpolicyTrainer`,
|
||||
and :class:`~tianshou.trainer.OfflineTrainer`. The trainer will automatically stop training when the policy
|
||||
reaches the stop condition ``stop_fn`` on test collector. Since DQN is an off-policy algorithm, we use the
|
||||
:class:`~tianshou.trainer.OffpolicyTrainer` as follows:
|
||||
::
|
||||
|
||||
result = ts.trainer.offpolicy_trainer(
|
||||
policy, train_collector, test_collector,
|
||||
result = ts.trainer.OffpolicyTrainer(
|
||||
policy=policy,
|
||||
train_collector=train_collector,
|
||||
test_collector=test_collector,
|
||||
max_epoch=10, step_per_epoch=10000, step_per_collect=10,
|
||||
update_per_step=0.1, episode_per_test=100, batch_size=64,
|
||||
train_fn=lambda epoch, env_step: policy.set_eps(0.1),
|
||||
test_fn=lambda epoch, env_step: policy.set_eps(0.05),
|
||||
stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold)
|
||||
stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold
|
||||
).run()
|
||||
print(f'Finished training! Use {result["duration"]}')
|
||||
|
||||
The meaning of each parameter is as follows (full description can be found at :func:`~tianshou.trainer.offpolicy_trainer`):
|
||||
The meaning of each parameter is as follows (full description can be found at :class:`~tianshou.trainer.OffpolicyTrainer`):
|
||||
|
||||
* ``max_epoch``: The maximum of epochs for training. The training process might be finished before reaching the ``max_epoch``;
|
||||
* ``step_per_epoch``: The number of environment step (a.k.a. transition) collected per epoch;
|
||||
|
@ -15,7 +15,7 @@ from torch.optim.lr_scheduler import LambdaLR
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
|
||||
from tianshou.data import Batch, Collector, ReplayBuffer, VectorReplayBuffer
|
||||
from tianshou.env import SubprocVectorEnv
|
||||
from tianshou.env import SubprocVectorEnv, VectorEnvNormObs
|
||||
from tianshou.policy import GAILPolicy
|
||||
from tianshou.trainer import OnpolicyTrainer
|
||||
from tianshou.utils import TensorboardLogger
|
||||
@ -97,15 +97,12 @@ def test_gail(args=get_args()):
|
||||
# train_envs = gym.make(args.task)
|
||||
train_envs = SubprocVectorEnv(
|
||||
[lambda: NoRewardEnv(gym.make(args.task)) for _ in range(args.training_num)],
|
||||
norm_obs=True,
|
||||
)
|
||||
train_envs = VectorEnvNormObs(train_envs)
|
||||
# test_envs = gym.make(args.task)
|
||||
test_envs = SubprocVectorEnv(
|
||||
[lambda: gym.make(args.task) for _ in range(args.test_num)],
|
||||
norm_obs=True,
|
||||
obs_rms=train_envs.obs_rms,
|
||||
update_obs_rms=False,
|
||||
)
|
||||
test_envs = SubprocVectorEnv([lambda: gym.make(args.task) for _ in range(args.test_num)])
|
||||
test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)
|
||||
test_envs.set_obs_rms(train_envs.get_obs_rms())
|
||||
|
||||
# seed
|
||||
np.random.seed(args.seed)
|
||||
|
Loading…
x
Reference in New Issue
Block a user