SPO outperforms PPO in all environments when the network deepens (five random seeds):
Training
The experimental environment is gymnasium
, and you need to execute the following command to install the dependencies:
MuJoCo
Installation
pip install gymnasium
pip install gymnasium[mujoco]
Reminder
Please change the code from
self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_iter + 1))
to
self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_niter + 1))
in line 593 of the file path venv\Lib\site-packages\gymnasium\envs\mujoco\mujoco_rendering.py
to resolve the error
Running
import gymnasium as gym
env = gym.make('Humanoid-v4', render_mode='human')
while True:
s, _ = env.reset()
done = False
while not done:
a = env.action_space.sample()
s_next, r, dw, tr, info = env.step(a)
done = (dw or tr)
Atari
Installation
pip install gymnasium[atari]
pip install gymnasium[accept-rom-license]
Reminder
v4 refers to the gym
library, a popular reinforcement learning environment, while v5 represents its successor, gymnasium
, which provides similar functionalities with potential improvements
Running
import gymnasium as gym
env = gym.make('ALE/Breakout-v5', render_mode='human')
while True:
s, _ = env.reset()
done = False
while not done:
a = env.action_space.sample()
s_next, r, dw, tr, info = env.step(a)
done = (dw or tr)
Description
Languages
Python
100%