Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com>
Inverse Reinforcement Learning
In inverse reinforcement learning setting, the agent learns a policy from interaction with an environment without reward and a fixed dataset which is collected with an expert policy.
Continuous control
Once the dataset is collected, it will not be changed during training. We use d4rl datasets to train agent for continuous control. You can refer to d4rl to see how to use d4rl datasets.
We provide implementation of GAIL algorithm for continuous control.
Train
You can parse d4rl datasets into a ReplayBuffer
, and set it as the parameter expert_buffer
of GAILPolicy
. irl_gail.py
is an example of inverse RL using the d4rl dataset.
To train an agent with BCQ algorithm:
python irl_gail.py --task HalfCheetah-v2 --expert-data-task halfcheetah-expert-v2