Update Mujoco Bemchmark's webpage (#606)

This commit is contained in:
ChenDRAG 2022-04-24 01:11:33 +08:00 committed by GitHub
parent e01385ea30
commit 5c9afe72f3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 79 additions and 3 deletions

View File

@ -150,3 +150,7 @@ ppo
Jupyter
Colab
Colaboratory
IPendulum
Reacher
Runtime
Nvidia

View File

@ -5,9 +5,9 @@ Benchmark
Mujoco Benchmark
----------------
Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_!).
Tianshou's Mujoco benchmark contains state-of-the-art results.
Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
.. raw:: html
@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
<br>
</center>
The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
up to 48 CPU cores (at most one CPU core for each thread).
========= ========= ============ ============== ============ ============== ==========
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
========= ========= ============ ============== ============ ============== ==========
DDPG 1 2.9h 12.0 80.2 2.4 5.4
TD3 1 3.3h 11.4 81.7 1.7 5.2
SAC 1 5.2h 10.9 83.8 1.8 3.5
REINFORCE 64 4min 84.9 1.8 12.5 0.8
A2C 16 7min 62.5 28.0 6.6 2.9
PPO 64 24min 11.4 85.3 3.2 0.2
NPG 16 7min 65.1 24.9 9.5 0.6
TRPO 16 7min 62.9 26.5 10.1 0.6
========= ========= ============ ============== ============ ============== ==========
Atari Benchmark
---------------

View File

@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai
### TRPO
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| Ant | **2866.7±707.9** | ~0 | N | N | ~150 |
| HalfCheetah | **4471.2±804.9** | ~400 | ~0 | ~1350 | ~850 |