Simple-Policy-Optimization/README.md

# SPO outperforms PPO in all environments when the network deepens (five random seeds):
![MuJoCo](https://github.com/MyRepositories-hub/Simple-Policy-Optimization/blob/main/draw_return_mujoco.png)

# Training

**The experimental environment is `gymnasium`, and you need to execute the following command to install the dependencies:**
## MuJoCo

### Installation
```bash
pip install gymnasium
pip install gymnasium[mujoco]
```

### Reminder
Please change the code from 
```python
self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_iter + 1))
```
to 
```python
self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_niter + 1))
```
in line 593 of the file path `venv\Lib\site-packages\gymnasium\envs\mujoco\mujoco_rendering.py` to resolve the error

### Running
```python
import gymnasium as gym

env = gym.make('Humanoid-v4', render_mode='human')
while True:
    s, _ = env.reset()
    done = False
    while not done:
        a = env.action_space.sample()
        s_next, r, dw, tr, info = env.step(a)
        done = (dw or tr)
```

## Atari

### Installation
```bash
pip install gymnasium[atari]
pip install gymnasium[accept-rom-license]
```
### Reminder
v4 refers to the `gym` library, a popular reinforcement learning environment, while v5 represents its successor, `gymnasium`, which provides similar functionalities with potential improvements

### Running
```python
import gymnasium as gym

env = gym.make('ALE/Breakout-v5', render_mode='human')
while True:
    s, _ = env.reset()
    done = False
    while not done:
        a = env.action_space.sample()
        s_next, r, dw, tr, info = env.step(a)
        done = (dw or tr)
```
Update README.md 2024-05-03 13:50:48 +08:00			`# SPO outperforms PPO in all environments when the network deepens (five random seeds):`
Create README.md 2024-05-03 13:42:59 +08:00			`![MuJoCo](https://github.com/MyRepositories-hub/Simple-Policy-Optimization/blob/main/draw_return_mujoco.png)`
Update README.md 2024-05-16 19:30:58 +08:00
			`# Training`
Update README.md 2024-05-19 12:29:10 +08:00
Update README.md 2024-05-16 19:30:58 +08:00			The experimental environment is `gymnasium`, and you need to execute the following command to install the dependencies:
			`## MuJoCo`
Update README.md 2024-05-19 12:29:10 +08:00
Update README.md 2024-05-16 19:30:58 +08:00			`### Installation`
			```bash
			`pip install gymnasium`
			`pip install gymnasium[mujoco]`
			```
Update README.md 2024-05-19 12:29:10 +08:00
Update README.md 2024-05-16 19:30:58 +08:00			`### Reminder`
			`Please change the code from`
			```python
			`self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_iter + 1))`
			```
			`to`
			```python
			`self.add_overlay(bottomleft, "Solver iterations", str(self.data.solver_niter + 1))`
			```
			in line 593 of the file path `venv\Lib\site-packages\gymnasium\envs\mujoco\mujoco_rendering.py` to resolve the error

			`### Running`
			```python
			`import gymnasium as gym`

			`env = gym.make('Humanoid-v4', render_mode='human')`
			`while True:`
			`s, _ = env.reset()`
			`done = False`
			`while not done:`
			`a = env.action_space.sample()`
			`s_next, r, dw, tr, info = env.step(a)`
			`done = (dw or tr)`
			```
Update README.md 2024-05-19 12:29:10 +08:00
Update README.md 2024-05-16 19:30:58 +08:00			`## Atari`
Update README.md 2024-05-19 12:29:10 +08:00
Update README.md 2024-05-16 19:30:58 +08:00			`### Installation`
			```bash
			`pip install gymnasium[atari]`
			`pip install gymnasium[accept-rom-license]`
			```
Update README.md 2024-05-19 12:29:10 +08:00			`### Reminder`
			v4 refers to the `gym` library, a popular reinforcement learning environment, while v5 represents its successor, `gymnasium`, which provides similar functionalities with potential improvements

Update README.md 2024-05-16 19:30:58 +08:00			`### Running`
			```python
			`import gymnasium as gym`

			`env = gym.make('ALE/Breakout-v5', render_mode='human')`
			`while True:`
			`s, _ = env.reset()`
			`done = False`
			`while not done:`
			`a = env.action_space.sample()`
			`s_next, r, dw, tr, info = env.step(a)`
			`done = (dw or tr)`
			```