Add high-level example to README
This commit is contained in:
parent
8d6df2b276
commit
961e9a7801
108
README.md
108
README.md
@ -188,7 +188,113 @@ Within this API, we can interact with different policies conveniently.
|
||||
|
||||
## Quick Start
|
||||
|
||||
This is an example of Deep Q Network. You can also run the full script at [test/discrete/test_dqn.py](https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py).
|
||||
Tianshou provides two API levels:
|
||||
* the high-level interface, which provides ease of use for end users seeking to run deep reinforcement learning applications
|
||||
* the procedural interface, which provides a maximum of control, especially for very advanced users and developers of reinforcement learning algorithms.
|
||||
|
||||
In the following, let us consider an example application using the *CartPole* gymnasium environment.
|
||||
We shall apply the deep Q network (DQN) learning algorithm using both APIs.
|
||||
|
||||
### High-Level API
|
||||
|
||||
To get started, we need some imports.
|
||||
|
||||
```python
|
||||
from tianshou.highlevel.config import SamplingConfig
|
||||
from tianshou.highlevel.env import (
|
||||
EnvFactoryGymnasium,
|
||||
VectorEnvType,
|
||||
)
|
||||
from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
|
||||
from tianshou.highlevel.params.policy_params import DQNParams
|
||||
from tianshou.highlevel.trainer import (
|
||||
TrainerEpochCallbackTestDQNSetEps,
|
||||
TrainerEpochCallbackTrainDQNSetEps,
|
||||
)
|
||||
```
|
||||
|
||||
In the high-level API, the basis for an RL experiment is an `ExperimentBuilder`
|
||||
with which we can build the experiment we then seek to run.
|
||||
Since we want to use DQN, we use the specialization `DQNExperimentBuilder`.
|
||||
The other imports serve to provide configuration options for our experiment.
|
||||
|
||||
The high-level API provides largely declarative semantics, i.e. the code is
|
||||
almost exclusively concerned with configuration that controls what to do
|
||||
(rather than how to do it).
|
||||
|
||||
```python
|
||||
experiment = (
|
||||
DQNExperimentBuilder(
|
||||
EnvFactoryGymnasium(task="CartPole-v1", seed=0, venv_type=VectorEnvType.DUMMY),
|
||||
ExperimentConfig(
|
||||
persistence_enabled=False,
|
||||
watch=True,
|
||||
watch_render=1 / 35,
|
||||
watch_num_episodes=100,
|
||||
),
|
||||
SamplingConfig(
|
||||
num_epochs=10,
|
||||
step_per_epoch=10000,
|
||||
batch_size=64,
|
||||
num_train_envs=10,
|
||||
num_test_envs=100,
|
||||
buffer_size=20000,
|
||||
step_per_collect=10,
|
||||
update_per_step=1 / 10,
|
||||
),
|
||||
)
|
||||
.with_dqn_params(
|
||||
DQNParams(
|
||||
lr=1e-3,
|
||||
discount_factor=0.9,
|
||||
estimation_step=3,
|
||||
target_update_freq=320,
|
||||
),
|
||||
)
|
||||
.with_model_factory_default(hidden_sizes=(64, 64))
|
||||
.with_epoch_train_callback(EpochTrainCallbackDQNSetEps(0.3))
|
||||
.with_epoch_test_callback(EpochTestCallbackDQNSetEps(0.0))
|
||||
.with_epoch_stop_callback(EpochStopCallbackRewardThreshold(195))
|
||||
.build()
|
||||
)
|
||||
experiment.run()
|
||||
```
|
||||
|
||||
The experiment builder takes three arguments:
|
||||
* the environment factory for the creation of environments. In this case,
|
||||
we use an existing factory implementation for gymnasium environments.
|
||||
* the experiment configuration, which controls persistence and the overall
|
||||
experiment flow. In this case, we have configured that we want to observe
|
||||
the agent's behavior after it is trained (`watch=True`) for a number of
|
||||
episodes (`watch_num_episodes=100`). We have disabled persistence, because
|
||||
we do not want to save training logs, the agent or its configuration for
|
||||
future use.
|
||||
* the sampling configuration, which controls fundamental training parameters,
|
||||
such as the total number of epochs we run the experiment for (`num_epochs=10`)
|
||||
and the number of environment steps each epoch shall consist of
|
||||
(`step_per_epoch=10000`).
|
||||
Every epoch consists of a series of data collection (rollout) steps and
|
||||
training steps.
|
||||
The parameter `step_per_collect` controls the amount of data that is
|
||||
collected in each collection step and after each collection step, we
|
||||
perform a training step, applying a gradient-based update based on a sample
|
||||
of data (`batch_size=64`) taken from the buffer of data that has been
|
||||
collected. For further details, see the documentation of `SamplingConfig`.
|
||||
|
||||
We then proceed to configure some of the parameters of the DQN algorithm itself
|
||||
and of the neural network model we want to use.
|
||||
A DQN-specific detail is the use of callbacks to configure the algorithm's
|
||||
epsilon parameter for exploration. We want to use random exploration during rollouts
|
||||
(train callback), but we don't when evaluating the agent's performance in the test
|
||||
environments (test callback).
|
||||
|
||||
Find the script in [examples/discrete/discrete_dqn_hl.py](examples/discrete/discrete_dqn_hl.py).
|
||||
|
||||
|
||||
### Procedural API
|
||||
|
||||
Let us now consider an analogous example in the procedural API.
|
||||
Find the full script from which the snippets below were derived at [test/discrete/test_dqn.py](https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py).
|
||||
|
||||
First, import some relevant packages:
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user