History

Naming and typing improvements in Actor/Critic/Policy forwards (#1032 )

Closes #917 

### Internal Improvements
- Better variable names related to model outputs (logits, dist input
etc.). #1032
- Improved typing for actors and critics, using Tianshou classes like
`Actor`, `ActorProb`, etc.,
instead of just `nn.Module`. #1032
- Added interfaces for most `Actor` and `Critic` classes to enforce the
presence of `forward` methods. #1032
- Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see
associated breaking change). #1032
- Use `.mode` of distribution instead of relying on knowledge of the
distribution type. #1032

### Breaking Changes

- Changed interface of `dist_fn` in `PGPolicy` and all subclasses to
take a single argument in both
continuous and discrete cases. #1032

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>

2024-04-01 17:14:17 +02:00

results/gail

Implement Generative Adversarial Imitation Learning (GAIL) (#550 )

2022-03-06 23:57:15 +08:00

irl_gail.py

Naming and typing improvements in Actor/Critic/Policy forwards (#1032 )

2024-04-01 17:14:17 +02:00

README.md

Implement Generative Adversarial Imitation Learning (GAIL) (#550 )

2022-03-06 23:57:15 +08:00

README.md

Inverse Reinforcement Learning

In inverse reinforcement learning setting, the agent learns a policy from interaction with an environment without reward and a fixed dataset which is collected with an expert policy.

Continuous control

Once the dataset is collected, it will not be changed during training. We use d4rl datasets to train agent for continuous control. You can refer to d4rl to see how to use d4rl datasets.

We provide implementation of GAIL algorithm for continuous control.

Train

You can parse d4rl datasets into a ReplayBuffer , and set it as the parameter expert_buffer of GAILPolicy. irl_gail.py is an example of inverse RL using the d4rl dataset.

To train an agent with BCQ algorithm:

python irl_gail.py --task HalfCheetah-v2 --expert-data-task halfcheetah-expert-v2

GAIL (single run)

task	best reward	parameters
HalfCheetah-v2	5177.07	`python3 irl_gail.py --task "HalfCheetah-v2" --expert-data-task "halfcheetah-expert-v2"`
Hopper-v2	1761.44	`python3 irl_gail.py --task "Hopper-v2" --expert-data-task "hopper-expert-v2"`
Walker2d-v2	2020.77	`python3 irl_gail.py --task "Walker2d-v2" --expert-data-task "walker2d-expert-v2"`