2021-04-16 20:37:12 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								Benchmark
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								=========
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-05-23 12:43:03 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-16 20:37:12 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								Mujoco Benchmark
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								----------------
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-04-24 01:11:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Tianshou's Mujoco benchmark contains state-of-the-art results.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-16 20:37:12 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-04-24 01:11:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-16 20:37:12 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-22 12:49:54 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								..  raw ::  html
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    <center>
 
							 
						 
					
						
							
								
									
										
										
										
											2024-02-09 19:43:10 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        <select id="env-mujoco" onchange="showMujocoResults(this)"></select>
 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-22 12:49:54 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        <br>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        <div id="vis-mujoco"></div>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        <br>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    </center>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-04-24 01:11:33 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper  <https://arxiv.org/pdf/1802.09477.pdf> `_ , `SAC paper  <https://arxiv.org/pdf/1812.05905.pdf> `_ , `PPO paper  <https://arxiv.org/pdf/1707.06347.pdf> `_ , `ACKTR paper  <https://arxiv.org/abs/1708.05144> `_ , `OpenAI Baselines  <https://github.com/openai/baselines> `_  and `Spinning Up  <https://spinningup.openai.com/en/latest/spinningup/bench.html> `_ .
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|Task                      |Ant       |HalfCheetah|Hopper    |Walker2d  |Swimmer  |Humanoid  |Reacher |IPendulum |IDPendulum|
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|DDPG     |Tianshou        |990.4     |**11718.7** |**2197.0** |1400.6    |**144.1** |**177.3**  |**-3.3** |**1000.0** |8364.3    |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 Paper       |**1005.3** |3305.6     |**2020.5** |1843.6    |/        |/         |-6.5    |**1000.0** |**9355.5** |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 Paper (Our) |888.8     |8577.3     |1860.0    |**3098.1** |/        |/         |-4.0    |**1000.0** |8370.0    |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |Spinning Up     |~840      |~11000     |~1800     |~1950     |~137     |/         |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|TD3      |Tianshou        |**5116.4** |**10201.2** |3472.2    |3982.4    |**104.2** |**5189.5** |**-2.7** |**1000.0** |**9349.2** |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 Paper       |4372.4    |9637.0     |**3564.1** |**4682.8** |/        |/         |-3.6    |**1000.0** |9337.5    |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |Spinning Up     |~3800     |~9750      |~2860     |~4000     |~78      |/         |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|SAC      |Tianshou        |**5850.2** |**12138.8** |**3542.2** |**5007.0** |**44.4**  |**5488.5** |**-2.6** |**1000.0** |**9359.5** |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |SAC Paper       |~3720     |~10400     |~3370     |~3740     |/        |~5200     |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 Paper       |655.4     |2347.2     |2996.7    |1283.7    |/        |/         |-4.4    |**1000.0** |8487.2    |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |Spinning Up     |~3980     |~11520     |~3150     |~4250     |~41.7    |/         |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|A2C      |Tianshou        |**3485.4** |**1829.9**  |**1253.2** |**1091.6** |**36.6**  |**1726.0** |**-6.7** |**1000.0** |**9257.7** |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |PPO Paper       |/         |~1000      |~900      |~850      |~31      |/         |~-24    |**~1000**  |~7100     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |PPO Paper (TR)  |/         |~930       |~1220     |~700      |**~36**   |/         |~-27    |**~1000**  |~8100     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|PPO      |Tianshou        |**3258.4** |**5783.9**  |**2609.3** |3588.5    |66.7     |**787.1**  |**-4.1** |**1000.0** |**9231.3** |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |PPO Paper       |/         |~1800      |~2330     |~3460     |~108     |/         |~-7     |**~1000**  |~8000     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 Paper       |1083.2    |1795.4     |2164.7    |3317.7    |/        |/         |-6.2    |**1000.0** |8977.9    |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |OpenAI Baselines|/         |~1700      |~2400     |~3510     |~111     |/         |~-6     |~940      |~7350     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |Spinning Up     |~650      |~1670      |~1850     |~1230     |**~120**  |/         |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|TRPO     |Tianshou        |**2866.7** |**4471.2**  |2046.0    |**3826.7** |40.9     |**810.1**  |**-5.1** |**1000.0** |**8435.2** |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |ACKTR paper     |~0        |~400       |~1400     |~550      |~40      |/         |-8      |**~1000**  |~800      |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |PPO Paper       |/         |~0         |~2100     |~1100     |**~121**  |/         |~-115   |**~1000**  |~200      |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |TD3 paper       |-75.9     |-15.6      |**2471.3** |2321.5    |/        |/         |-111.4  |985.4     |205.9     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |OpenAI Baselines|/         |~1350      |**~2200**  |~2350     |~95      |/         |**~-5**  |~910      |~7000     |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+          +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|          |Spinning Up (TF)|~150      |~850       |~1200     |~600      |~85      |/         |/       |/         |/         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								up to 48 CPU cores (at most one CPU core for each thread).
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								========= ========= ============ ============== ============ ============== ==========
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								========= ========= ============ ============== ============ ============== ==========
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								DDPG      1         2.9h         12.0           80.2         2.4            5.4
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								TD3       1         3.3h         11.4           81.7         1.7            5.2
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								SAC       1         5.2h         10.9           83.8         1.8            3.5
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								REINFORCE 64        4min         84.9           1.8          12.5           0.8
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								A2C       16        7min         62.5           28.0         6.6            2.9
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								PPO       64        24min        11.4           85.3         3.2            0.2
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								NPG       16        7min         65.1           24.9         9.5            0.6
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								TRPO      16        7min         62.9           26.5         10.1           0.6
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								========= ========= ============ ============== ============ ============== ==========
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-22 12:49:54 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-16 20:37:12 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								Atari Benchmark
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								---------------
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-04-27 21:10:45 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Tianshou also provides reliable and reproducible Atari 10M benchmark.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Every experiment is conducted under 10 random seeds for 10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/atari for source code and refer to https://wandb.ai/tianshou/atari.benchmark/reports/Atari-Benchmark--VmlldzoxOTA1NzA5 for detailed results hosted on wandb.
 
							 
						 
					
						
							
								
									
										
										
										
											2022-04-24 20:44:54 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								..  raw ::  html
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    <center>
 
							 
						 
					
						
							
								
									
										
										
										
											2024-02-09 19:43:10 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        <select id="env-atari" onchange="showAtariResults(this)"></select>
 
							 
						 
					
						
							
								
									
										
										
										
											2022-04-24 20:44:54 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        <br>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        <div id="vis-atari"></div>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        <br>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    </center>
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-04-27 21:10:45 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The table below compares the performance of Tianshou against published results on Atari games. We use max average return in 10M timesteps as the reward metric **(to be consistent with Mujoco)** . `` / ``  means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `Google Dopamine  <https://github.com/google/dopamine/tree/master/baselines/atari> `_  and `OpenAI Baselines  <https://github.com/openai/baselines> `_ .
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|Task                    |Pong          |Breakout        |Enduro            |Qbert               |MsPacman      |Seaquest           |SpaceInvaders     |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+=======+================+==============+================+==================+====================+==============+===================+==================+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|DQN    |Tianshou        |**20.2 ± 2.3** |**133.5 ± 44.6** |997.9 ± 180.6     |**11620.2 ± 786.1**  |2324.8 ± 359.8|**3213.9 ± 381.6**  |947.9 ± 155.3     |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |Dopamine        |9.8           |92.2            |**2126.9**         |6836.7              |**2451.3**     |1406.6             |**1559.1**         |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |OpenAI Baselines|16.5          |131.5           |479.8             |3254.8              |/             |1164.1             |1129.5 ± 145.3    |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|C51    |Tianshou        |**20.6 ± 2.4** |**412.9 ± 35.8** |**940.8 ± 133.9**  |**12513.2 ± 1274.6** |2254.9 ± 201.2|**3305.4 ± 1524.3** |557.3             |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |Dopamine        |17.4          |222.4           |665.3             |9924.5              |**2860.4**     |1706.6             |**604.6 ± 157.5**  |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|Rainbow|Tianshou        |**20.2 ± 3.0** |**440.4 ± 50.1** |1496.1 ± 112.3    |14224.8 ± 1230.1    |2524.2 ± 338.8|1934.6 ± 376.4     |**1178.4**         |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |Dopamine        |19.1          |47.9            |**2185.1**         |**15682.2**          |**3161.7**     |**3328.9**          |459.9             |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|IQN    |Tianshou        |**20.7 ± 2.9** |**355.9 ± 22.7** |**1252.7 ± 118.1** |**14409.2 ± 808.6**  |2228.6 ± 253.1|5341.2 ± 670.2     |667.8 ± 81.5      |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |Dopamine        |19.6          |96.3            |1227.6            |12496.7             |**4422.7**     |**16418**           |**1358.2 ± 267.6** |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|PPO    |Tianshou        |**20.3 ± 1.2** |**283.0 ± 74.3** |**1098.9 ± 110.5** |**12341.8 ± 1760.7** |1699.4 ± 248.0|1035.2 ± 353.6     |1641.3            |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+        +----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|        |OpenAI Baselines|13.7          |114.3           |350.2             |7012.1              |/             |**1218.9**          |**1787.5 ± 340.8** |
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|QR-DQN |Tianshou        |20.7 ± 2.0    |228.3 ± 27.3    |951.7 ± 333.5     |14761.5 ± 862.9     |2259.3 ± 269.2|4187.6 ± 725.7     |1114.7 ± 116.9    |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								|FQF    |Tianshou        |20.4 ± 2.5    |382.6 ± 29.5    |1816.8 ± 314.3    |15301.2 ± 684.1     |2506.6 ± 402.5|8051.5 ± 3155.6    |2558.3            |
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								+-------+----------------+--------------+----------------+------------------+--------------------+--------------+-------------------+------------------+
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Please note that the comparison table for both two benchmarks could NOT be used to prove which implementation is "better". The hyperparameters of the algorithms vary across different implementations. Also, the reward metric is not strictly the same (e.g. Tianshou uses max average return in 10M steps but OpenAI Baselines only report average return at 10M steps, which is unfair). Lastly, Tianshou always uses 10 random seeds while others might use fewer. The comparison is here only to show Tianshou's reliability.