From 7f23748347d6bf4aebce3931f7e57291012cd98d Mon Sep 17 00:00:00 2001 From: ChenDRAG <40993476+ChenDRAG@users.noreply.github.com> Date: Wed, 27 Apr 2022 21:10:45 +0800 Subject: [PATCH] Compare Atari results with dopamine and OpenAI Baselines (#616) --- docs/spelling_wordlist.txt | 3 +++ docs/tutorials/benchmark.rst | 38 +++++++++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index cf78b00..31e53fc 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -154,3 +154,6 @@ IPendulum Reacher Runtime Nvidia +Enduro +Qbert +Seaquest diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst index f1cb6ba..b67ccbf 100644 --- a/docs/tutorials/benchmark.rst +++ b/docs/tutorials/benchmark.rst @@ -94,7 +94,9 @@ TRPO 16 7min 62.9 26.5 10.1 0.6 Atari Benchmark --------------- -Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari +Tianshou also provides reliable and reproducible Atari 10M benchmark. + +Every experiment is conducted under 10 random seeds for 10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari for source code and refer to https://wandb.ai/tianshou/atari.benchmark/reports/Atari-Benchmark--VmlldzoxOTA1NzA5 for detailed results hosted on wandb. .. raw:: html @@ -105,3 +107,37 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
+ +The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric **(to be consistent with Mujoco)**. ``/`` means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine `_ and `OpenAI Baselines `_. + ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|Task |Pong |Breakout |Enduro |Qbert |MsPacman |Seaquest |SpaceInvaders | ++=======+================+==============+================+==================+====================+==============+===================+==================+ +|DQN |Tianshou |**20.2 ± 2.3**|**133.5 ± 44.6**|997.9 ± 180.6 |**11620.2 ± 786.1** |2324.8 ± 359.8|**3213.9 ± 381.6** |947.9 ± 155.3 | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |Dopamine |9.8 |92.2 |**2126.9** |6836.7 |**2451.3** |1406.6 |**1559.1** | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |OpenAI Baselines|16.5 |131.5 |479.8 |3254.8 |/ |1164.1 |1129.5 ± 145.3 | ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|C51 |Tianshou |**20.6 ± 2.4**|**412.9 ± 35.8**|**940.8 ± 133.9** |**12513.2 ± 1274.6**|2254.9 ± 201.2|**3305.4 ± 1524.3**|557.3 | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |Dopamine |17.4 |222.4 |665.3 |9924.5 |**2860.4** |1706.6 |**604.6 ± 157.5** | ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|Rainbow|Tianshou |**20.2 ± 3.0**|**440.4 ± 50.1**|1496.1 ± 112.3 |14224.8 ± 1230.1 |2524.2 ± 338.8|1934.6 ± 376.4 |**1178.4** | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |Dopamine |19.1 |47.9 |**2185.1** |**15682.2** |**3161.7** |**3328.9** |459.9 | ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|IQN |Tianshou |**20.7 ± 2.9**|**355.9 ± 22.7**|**1252.7 ± 118.1**|**14409.2 ± 808.6** |2228.6 ± 253.1|5341.2 ± 670.2 |667.8 ± 81.5 | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |Dopamine |19.6 |96.3 |1227.6 |12496.7 |**4422.7** |**16418** |**1358.2 ± 267.6**| ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|PPO |Tianshou |**20.3 ± 1.2**|**283.0 ± 74.3**|**1098.9 ± 110.5**|**12341.8 ± 1760.7**|1699.4 ± 248.0|1035.2 ± 353.6 |1641.3 | ++ +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +| |OpenAI Baselines|13.7 |114.3 |350.2 |7012.1 |/ |**1218.9** |**1787.5 ± 340.8**| ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|QR-DQN |Tianshou |20.7 ± 2.0 |228.3 ± 27.3 |951.7 ± 333.5 |14761.5 ± 862.9 |2259.3 ± 269.2|4187.6 ± 725.7 |1114.7 ± 116.9 | ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ +|FQF |Tianshou |20.4 ± 2.5 |382.6 ± 29.5 |1816.8 ± 314.3 |15301.2 ± 684.1 |2506.6 ± 402.5|8051.5 ± 3155.6 |2558.3 | ++-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+ + +Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.