hongshaorou/Tianshou

Fork 0

History

Yi Su a59d96d041

Add Intrinsic Curiosity Module (#503 )

2022-01-15 02:43:48 +08:00

maps

bump to v0.4.3 (#432 )

2021-09-03 05:05:04 +08:00

results/c51

add vizdoom example, bump version to 0.4.2 (#384 )

2021-06-26 18:08:41 +08:00

.gitignore

add vizdoom example, bump version to 0.4.2 (#384 )

2021-06-26 18:08:41 +08:00

env.py

bump to v0.4.3 (#432 )

2021-09-03 05:05:04 +08:00

network.py

add vizdoom example, bump version to 0.4.2 (#384 )

2021-06-26 18:08:41 +08:00

README.md

add vizdoom example, bump version to 0.4.2 (#384 )

2021-06-26 18:08:41 +08:00

replay.py

bump to v0.4.3 (#432 )

2021-09-03 05:05:04 +08:00

vizdoom_a2c_icm.py

Add Intrinsic Curiosity Module (#503 )

2022-01-15 02:43:48 +08:00

vizdoom_c51.py

fix logger.write error in atari script (#444 )

2021-09-09 00:51:39 +08:00

README.md

ViZDoom

ViZDoom is a popular RL env for a famous first-person shooting game Doom. Here we provide some results and intuitions for this scenario.

Train

To train an agent:

python3 vizdoom_c51.py --task {D1_basic|D3_battle|D4_battle2}

D1 (health gathering) should finish training (no death) in less than 500k env step (5 epochs);

D3 can reach 1600+ reward (75+ killcount in 5 minutes);

D4 can reach 700+ reward. Here is the result:

(episode length, the maximum length is 2625 because we use frameskip=4, that is 10500/4=2625)

(episode reward)

To evaluate an agent's performance:

python3 vizdoom_c51.py --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}

To save .lmp files for recording:

python3 vizdoom_c51.py --save-lmp --test-num 100 --resume-path policy.pth --watch --task {D1_basic|D3_battle|D4_battle2}

it will store lmp file in lmps/ directory. To watch these lmp files (for example, d3 lmp):

python3 replay.py maps/D3_battle.cfg episode_8_25.lmp

We provide two lmp files (d3 best and d4 best) under results/c51, you can use the following command to enjoy:

python3 replay.py maps/D3_battle.cfg results/c51/d3.lmp
python3 replay.py maps/D4_battle2.cfg results/c51/d4.lmp

Maps

See maps/README.md

Algorithms

The setting is exactly the same as Atari. You can definitely try more algorithms listed in Atari example.

Reward

living reward is bad
combo-action is really important
negative reward for health and ammo2 is really helpful for d3/d4
only with positive reward for health is really helpful for d1
remove MOVE_BACKWARD may converge faster but the final performance may be lower