100 lines
4.2 KiB
Markdown
100 lines
4.2 KiB
Markdown
# ViZDoom
|
|
|
|
[ViZDoom](https://github.com/mwydmuch/ViZDoom) is a popular RL env for a famous first-person shooting game Doom. Here we provide some results and intuitions for this scenario.
|
|
|
|
## EnvPool
|
|
|
|
We highly recommend using envpool to run the following experiments. To install, in a linux machine, type:
|
|
|
|
```bash
|
|
pip install envpool
|
|
```
|
|
|
|
After that, `make_vizdoom_env` will automatically switch to envpool's ViZDoom env. EnvPool's implementation is much faster (about 2\~3x faster for pure execution speed, 1.5x for overall RL training pipeline) than python vectorized env implementation.
|
|
|
|
For more information, please refer to EnvPool's [GitHub](https://github.com/sail-sg/envpool/) and [Docs](https://envpool.readthedocs.io/en/latest/api/vizdoom.html).
|
|
|
|
## Train
|
|
|
|
To train an agent:
|
|
|
|
```bash
|
|
python3 vizdoom_c51.py --task {D1_basic|D2_navigation|D3_battle|D4_battle2}
|
|
```
|
|
|
|
D1 (health gathering) should finish training (no death) in less than 500k env step (5 epochs);
|
|
|
|
D3 can reach 1600+ reward (75+ killcount in 5 minutes);
|
|
|
|
D4 can reach 700+ reward. Here is the result:
|
|
|
|
(episode length, the maximum length is 2625 because we use frameskip=4, that is 10500/4=2625)
|
|
|
|

|
|
|
|
(episode reward)
|
|
|
|

|
|
|
|
To evaluate an agent's performance:
|
|
|
|
```bash
|
|
python3 vizdoom_c51.py --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
|
|
```
|
|
|
|
To save `.lmp` files for recording:
|
|
|
|
```bash
|
|
python3 vizdoom_c51.py --save-lmp --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}
|
|
```
|
|
|
|
it will store `lmp` file in `lmps/` directory. To watch these `lmp` files (for example, d3 lmp):
|
|
|
|
```bash
|
|
python3 replay.py maps/D3_battle.cfg episode_8_25.lmp
|
|
```
|
|
|
|
We provide two lmp files (d3 best and d4 best) under `results/c51`, you can use the following command to enjoy:
|
|
|
|
```bash
|
|
python3 replay.py maps/D3_battle.cfg results/c51/d3.lmp
|
|
python3 replay.py maps/D4_battle2.cfg results/c51/d4.lmp
|
|
```
|
|
|
|
## Maps
|
|
|
|
See [maps/README.md](maps/README.md)
|
|
|
|
## Reward
|
|
|
|
1. living reward is bad
|
|
2. combo-action is really important
|
|
3. negative reward for health and ammo2 is really helpful for d3/d4
|
|
4. only with positive reward for health is really helpful for d1
|
|
5. remove MOVE_BACKWARD may converge faster but the final performance may be lower
|
|
|
|
## Algorithms
|
|
|
|
The setting is exactly the same as Atari. You can definitely try more algorithms listed in Atari example.
|
|
|
|
### C51 (single run)
|
|
|
|
| task | best reward | reward curve | parameters |
|
|
| --------------------------- | ----------- | ------------------------------------- | ------------------------------------------------------------ |
|
|
| D2_navigation | 747.52 |  | `python3 vizdoom_c51.py --task "D2_navigation"` |
|
|
| D3_battle | 1855.29 |  | `python3 vizdoom_c51.py --task "D3_battle"` |
|
|
|
|
### PPO (single run)
|
|
|
|
| task | best reward | reward curve | parameters |
|
|
| --------------------------- | ----------- | ------------------------------------- | ------------------------------------------------------------ |
|
|
| D2_navigation | 770.75 |  | `python3 vizdoom_ppo.py --task "D2_navigation"` |
|
|
| D3_battle | 320.59 |  | `python3 vizdoom_ppo.py --task "D3_battle"` |
|
|
|
|
### PPO with ICM (single run)
|
|
|
|
| task | best reward | reward curve | parameters |
|
|
| --------------------------- | ----------- | ------------------------------------- | ------------------------------------------------------------ |
|
|
| D2_navigation | 844.99 |  | `python3 vizdoom_ppo.py --task "D2_navigation" --icm-lr-scale 10` |
|
|
| D3_battle | 547.08 |  | `python3 vizdoom_ppo.py --task "D3_battle" --icm-lr-scale 10` |
|