108 lines
9.2 KiB
ReStructuredText
108 lines
9.2 KiB
ReStructuredText
Benchmark
|
|
=========
|
|
|
|
|
|
Mujoco Benchmark
|
|
----------------
|
|
|
|
Tianshou's Mujoco benchmark contains state-of-the-art results.
|
|
|
|
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
|
|
|
|
.. raw:: html
|
|
|
|
<center>
|
|
<select id="env-mujoco" onchange="showMujocoEnv(this)"></select>
|
|
<br>
|
|
<div id="vis-mujoco"></div>
|
|
<br>
|
|
</center>
|
|
|
|
The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.
|
|
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
|
|
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
|
|
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
|
|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
|
|
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|
|
|
|
Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
|
|
up to 48 CPU cores (at most one CPU core for each thread).
|
|
|
|
========= ========= ============ ============== ============ ============== ==========
|
|
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
|
|
========= ========= ============ ============== ============ ============== ==========
|
|
DDPG 1 2.9h 12.0 80.2 2.4 5.4
|
|
TD3 1 3.3h 11.4 81.7 1.7 5.2
|
|
SAC 1 5.2h 10.9 83.8 1.8 3.5
|
|
REINFORCE 64 4min 84.9 1.8 12.5 0.8
|
|
A2C 16 7min 62.5 28.0 6.6 2.9
|
|
PPO 64 24min 11.4 85.3 3.2 0.2
|
|
NPG 16 7min 65.1 24.9 9.5 0.6
|
|
TRPO 16 7min 62.9 26.5 10.1 0.6
|
|
========= ========= ============ ============== ============ ============== ==========
|
|
|
|
|
|
Atari Benchmark
|
|
---------------
|
|
|
|
Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari
|
|
|
|
.. raw:: html
|
|
|
|
<center>
|
|
<select id="env-atari" onchange="showAtariEnv(this)"></select>
|
|
<br>
|
|
<div id="vis-atari"></div>
|
|
<br>
|
|
</center>
|
|
|