28 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			28 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
								 | 
							
								# Inverse Reinforcement Learning
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In inverse reinforcement learning setting, the agent learns a policy from interaction with an environment without reward and a fixed dataset which is collected with an expert policy.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## Continuous control
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Once the dataset is collected, it will not be changed during training. We use [d4rl](https://github.com/rail-berkeley/d4rl) datasets to train agent for continuous control. You can refer to [d4rl](https://github.com/rail-berkeley/d4rl) to see how to use d4rl datasets. 
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								We provide implementation of GAIL algorithm for continuous control.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								### Train
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								You can parse d4rl datasets into a `ReplayBuffer` , and set it as the parameter `expert_buffer` of `GAILPolicy`.  `irl_gail.py` is an example of inverse RL using the d4rl dataset.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								To train an agent with BCQ algorithm:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```bash
							 | 
						||
| 
								 | 
							
								python irl_gail.py --task HalfCheetah-v2 --expert-data-task halfcheetah-expert-v2
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## GAIL (single run)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								| task                        | best reward | reward curve                          | parameters                                                   |
							 | 
						||
| 
								 | 
							
								| --------------------------- | ----------- | ------------------------------------- | ------------------------------------------------------------ |
							 | 
						||
| 
								 | 
							
								| HalfCheetah-v2     | 5177.07        |          | `python3 irl_gail.py --task "HalfCheetah-v2" --expert-data-task "halfcheetah-expert-v2"` |
							 | 
						||
| 
								 | 
							
								| Hopper-v2          | 1761.44        |      | `python3 irl_gail.py --task "Hopper-v2" --expert-data-task "hopper-expert-v2"` |
							 | 
						||
| 
								 | 
							
								| Walker2d-v2        | 2020.77        |        | `python3 irl_gail.py --task "Walker2d-v2" --expert-data-task "walker2d-expert-v2"`  |
							 |