Method is tested on MuJoCo continuous control tasks and Atari discrete game tasks in OpenAI gym. Networks are trained using tensorflow1.10 and Python 3.6.

Installation

git clone --recursive https://github.com/wangyuhuix/TrulyPPO
cd TrulyPPO
pip install -r requirements.txt

Usage

Command Line arguments

env: environment ID
seed: random seed
num_timesteps: number of timesteps

Continuous Task

python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=InvertedPendulum-v2 --seed=0

You can try --alg=pporb for PPO-RB and --alg-trppo for TR-PPO.

Discrete Task

python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=BeamRiderNoFrameskip-v4 --seed=0 --isatari