Add high-level example to README

2024-01-11 18:12:22 +01:00 · 2024-01-11 18:12:22 +01:00 · 961e9a7801
commit 961e9a7801
parent 8d6df2b276
1 changed files with 107 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -188,7 +188,113 @@ Within this API, we can interact with different policies conveniently.

 ## Quick Start

-This is an example of Deep Q Network. You can also run the full script at [test/discrete/test_dqn.py](https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py).
+Tianshou provides two API levels:
+  * the high-level interface, which provides ease of use for end users seeking to run deep reinforcement learning applications
+  * the procedural interface, which provides a maximum of control, especially for very advanced users and developers of reinforcement learning algorithms.
+
+In the following, let us consider an example application using the *CartPole* gymnasium environment.
+We shall apply the deep Q network (DQN) learning algorithm using both APIs.
+
+### High-Level API
+
+To get started, we need some imports.
+
+```python
+from tianshou.highlevel.config import SamplingConfig
+from tianshou.highlevel.env import (
+    EnvFactoryGymnasium,
+    VectorEnvType,
+)
+from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
+from tianshou.highlevel.params.policy_params import DQNParams
+from tianshou.highlevel.trainer import (
+    TrainerEpochCallbackTestDQNSetEps,
+    TrainerEpochCallbackTrainDQNSetEps,
+)
+```
+
+In the high-level API, the basis for an RL experiment is an `ExperimentBuilder`
+with which we can build the experiment we then seek to run.
+Since we want to use DQN, we use the specialization `DQNExperimentBuilder`.
+The other imports serve to provide configuration options for our experiment.
+
+The high-level API provides largely declarative semantics, i.e. the code is 
+almost exclusively concerned with configuration that controls what to do
+(rather than how to do it).
+
+```python
+experiment = (
+    DQNExperimentBuilder(
+        EnvFactoryGymnasium(task="CartPole-v1", seed=0, venv_type=VectorEnvType.DUMMY),
+        ExperimentConfig(
+            persistence_enabled=False,
+            watch=True,
+            watch_render=1 / 35,
+            watch_num_episodes=100,
+        ),
+        SamplingConfig(
+            num_epochs=10,
+            step_per_epoch=10000,
+            batch_size=64,
+            num_train_envs=10,
+            num_test_envs=100,
+            buffer_size=20000,
+            step_per_collect=10,
+            update_per_step=1 / 10,
+        ),
+    )
+    .with_dqn_params(
+        DQNParams(
+            lr=1e-3,
+            discount_factor=0.9,
+            estimation_step=3,
+            target_update_freq=320,
+        ),
+    )
+    .with_model_factory_default(hidden_sizes=(64, 64))
+    .with_epoch_train_callback(EpochTrainCallbackDQNSetEps(0.3))
+    .with_epoch_test_callback(EpochTestCallbackDQNSetEps(0.0))
+    .with_epoch_stop_callback(EpochStopCallbackRewardThreshold(195))
+    .build()
+)
+experiment.run()
+```
+
+The experiment builder takes three arguments:
+  * the environment factory for the creation of environments. In this case,
+    we use an existing factory implementation for gymnasium environments.
+  * the experiment configuration, which controls persistence and the overall
+    experiment flow. In this case, we have configured that we want to observe
+    the agent's behavior after it is trained (`watch=True`) for a number of
+    episodes (`watch_num_episodes=100`). We have disabled persistence, because
+    we do not want to save training logs, the agent or its configuration for 
+    future use.
+  * the sampling configuration, which controls fundamental training parameters,
+    such as the total number of epochs we run the experiment for (`num_epochs=10`)  
+    and the number of environment steps each epoch shall consist of
+    (`step_per_epoch=10000`).
+    Every epoch consists of a series of data collection (rollout) steps and 
+    training steps.
+    The parameter `step_per_collect` controls the amount of data that is 
+    collected in each collection step and after each collection step, we
+    perform a training step, applying a gradient-based update based on a sample
+    of data (`batch_size=64`) taken from the buffer of data that has been 
+    collected. For further details, see the documentation of `SamplingConfig`.
+
+We then proceed to configure some of the parameters of the DQN algorithm itself
+and of the neural network model we want to use.
+A DQN-specific detail is the use of callbacks to configure the algorithm's
+epsilon parameter for exploration. We want to use random exploration during rollouts 
+(train callback), but we don't when evaluating the agent's performance in the test
+environments (test callback).
+
+Find the script in [examples/discrete/discrete_dqn_hl.py](examples/discrete/discrete_dqn_hl.py).
+
+
+### Procedural API
+
+Let us now consider an analogous example in the procedural API. 
+Find the full script from which the snippets below were derived at [test/discrete/test_dqn.py](https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_dqn.py).

 First, import some relevant packages: