SPO outperforms PPO in all environments when the network deepens (five random seeds):

Training

The experimental environment is gymnasium, and you need to execute the following command to install the dependencies:

MuJoCo

Installation

pip install gymnasium
pip install gymnasium[mujoco]

Reminder

Please change the code from

self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_iter + 1))

self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_niter + 1))

in line 593 of the file path venv\Lib\site-packages\gymnasium\envs\mujoco\mujoco_rendering.py to resolve the error

Running

import gymnasium as gym

env = gym.make('Humanoid-v4', render_mode='human')
while True:
    s, _ = env.reset()
    done = False
    while not done:
        a = env.action_space.sample()
        s_next, r, dw, tr, info = env.step(a)
        done = (dw or tr)

Atari

Installation

pip install gymnasium[atari]
pip install gymnasium[accept-rom-license]

Reminder

v4 refers to the gym library, a popular reinforcement learning environment, while v5 represents its successor, gymnasium, which provides similar functionalities with potential improvements

Running

import gymnasium as gym

env = gym.make('ALE/Breakout-v5', render_mode='human')
while True:
    s, _ = env.reset()
    done = False
    while not done:
        a = env.action_space.sample()
        s_next, r, dw, tr, info = env.step(a)
        done = (dw or tr)