Tianshou/docs/tutorials/benchmark.rst
2022-04-24 20:44:54 +08:00

108 lines
9.2 KiB
ReStructuredText

Benchmark
=========
Mujoco Benchmark
----------------
Tianshou's Mujoco benchmark contains state-of-the-art results.
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
.. raw:: html
<center>
<select id="env-mujoco" onchange="showMujocoEnv(this)"></select>
<br>
<div id="vis-mujoco"></div>
<br>
</center>
The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
up to 48 CPU cores (at most one CPU core for each thread).
========= ========= ============ ============== ============ ============== ==========
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
========= ========= ============ ============== ============ ============== ==========
DDPG 1 2.9h 12.0 80.2 2.4 5.4
TD3 1 3.3h 11.4 81.7 1.7 5.2
SAC 1 5.2h 10.9 83.8 1.8 3.5
REINFORCE 64 4min 84.9 1.8 12.5 0.8
A2C 16 7min 62.5 28.0 6.6 2.9
PPO 64 24min 11.4 85.3 3.2 0.2
NPG 16 7min 65.1 24.9 9.5 0.6
TRPO 16 7min 62.9 26.5 10.1 0.6
========= ========= ============ ============== ============ ============== ==========
Atari Benchmark
---------------
Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
.. raw:: html
<center>
<select id="env-atari" onchange="showAtariEnv(this)"></select>
<br>
<div id="vis-atari"></div>
<br>
</center>