2024-05-03 13:50:48 +08:00

3 lines
197 B
Markdown

# SPO outperforms PPO in all environments when the network deepens (five random seeds):
![MuJoCo](https://github.com/MyRepositories-hub/Simple-Policy-Optimization/blob/main/draw_return_mujoco.png)