Truly Proximal Policy Optimization
Source code for the paper: Truly Proximal Policy Optmization. The original code was forked from OpenAI baselines.
Method is tested on MuJoCo continuous control tasks and Atari discrete game tasks in OpenAI gym. Networks are trained using tensorflow1.10 and Python 3.6.
Installation
git clone --recursive https://github.com/wangyuhuix/TrulyPPO
cd TrulyPPO
pip install -r requirements.txt
Usage
Command Line arguments
- env: environment ID
- seed: random seed
- num_timesteps: number of timesteps
Continuous Task
python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=InvertedPendulum-v2 --seed=0
You can try --alg=pporb
for PPO-RB and --alg-trppo
for TR-PPO.
Discrete Task
python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=BeamRiderNoFrameskip-v4 --seed=0 --isatari
Description
Languages
Python
100%