195 Commits

Author SHA1 Message Date
Yi Su
9ce0a554dc
Add Atari SAC examples (#657)
- Add Atari (discrete) SAC examples;
- Fix a bug in Discrete SAC evaluation; default to deterministic mode.
2022-06-04 13:26:08 +08:00
Jiayi Weng
5ecea2402e
Fix save_checkpoint_fn return value (#659)
- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator.
2022-06-03 01:07:07 +08:00
Jiayi Weng
109875d43d
Fix num_envs=test_num (#653)
* fix num_envs=test_num

* fix mypy
2022-05-30 12:38:47 +08:00
Michal Gregor
c87b9f49bc
Add show_progress option for trainer (#641)
- A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar;
- Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.
2022-05-17 23:41:59 +08:00
Anas BELFADIL
53e6b0408d
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 21:40:32 +08:00
Jiayi Weng
bf8f63ffc3
use envpool in vizdoom example, update doc (#634) 2022-05-09 00:42:16 +08:00
Jiayi Weng
2a7c151738
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628)
- add VectorEnvWrapper and VectorEnvNormObs
- obs_rms store in policy save/load
- align mujoco scripts with atari: obs_norm, envpool, wandb and README
2022-05-05 19:55:15 +08:00
Yi Su
a7c789f851
Improve data loading from D4RL and convert RL Unplugged to D4RL format (#624) 2022-05-04 04:37:52 +08:00
Yi Su
dd16818ce4
implement REDQ based on original contribution by @Jimenius (#623)
Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn>
2022-05-01 00:06:00 +08:00
Yi Su
41afc2584a
Convert RL Unplugged Atari datasets to tianshou ReplayBuffer (#621) 2022-04-29 19:33:28 +08:00
Chengqi Duan
5eab7dc218
Add Atari Results (#600) 2022-04-24 20:44:54 +08:00
ChenDRAG
5c9afe72f3
Update Mujoco Bemchmark's webpage (#606) 2022-04-24 01:11:33 +08:00
Alex Nikulkov
92456cdb68
Add learning rate scheduler to BasePolicy (#598) 2022-04-17 23:52:30 +08:00
Jiayi Weng
2a9c9289e5
rename save_fn to save_best_fn to avoid ambiguity (#575)
This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.
2022-03-22 04:29:27 +08:00
Yi Su
9cb74e60c9
Add imitation baselines for offline RL (#566)
add imitation baselines for offline RL; make the choice of env/task and D4RL dataset explicit; on expert datasets, IL easily outperforms; after reading the D4RL paper, I'll rerun the exps on medium data
2022-03-12 21:33:54 +08:00
Chengqi Duan
ad2e1eaea0 Fix WandbLogger import error in Atari examples (#562) 2022-03-08 08:38:56 -05:00
Costa Huang
df3d7f582b
Update WandbLogger implementation (#558)
* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-07 06:40:47 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) (#550)
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)
2022-03-06 23:57:15 +08:00
Yi Su
97df511a13
Add VizDoom PPO example and results (#533)
* update vizdoom ppo example

* update README with results
2022-02-25 09:33:34 +08:00
Chengqi Duan
23fbc3b712
upgrade gym version to >=0.21, fix related CI and update examples/atari (#534)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-25 07:40:33 +08:00
Yi Su
d29188ee77
update atari ppo slots (#529) 2022-02-13 04:04:21 +08:00
Yi Su
40289b8b0e
Add atari ppo example (#523)
I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156.
2022-02-11 06:45:06 +08:00
ChenDRAG
c25926dd8f
Formalize variable names (#509)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example (#506) 2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module (#503) 2022-01-15 02:43:48 +08:00
Yi Su
3592f45446
Fix critic network for Discrete CRR (#485)
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
2021-11-28 23:10:28 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example (#480)
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Jiayi Weng
098d466467
fix atari wrapper to be deterministic (#467) 2021-10-19 22:26:11 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger (#441)
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
Jiayi Weng
e8f8cdfa41
fix logger.write error in atari script (#444)
- fix a bug in #427: logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps
2021-09-09 00:51:39 +08:00
n+e
fc251ab0b8
bump to v0.4.3 (#432)
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
2021-09-03 05:05:04 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger (#427)
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
Yi Su
291be08d43
Add Rainbow DQN (#386)
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
n+e
ebaca6f8da
add vizdoom example, bump version to 0.4.2 (#384) 2021-06-26 18:08:41 +08:00
Yi Su
c0bc8e00ca
Add Fully-parameterized Quantile Function (#376) 2021-06-15 11:59:02 +08:00
Yi Su
21b2b22cd7
update iqn results and reward plots (#377) 2021-06-10 09:05:25 +08:00
Yi Su
f3169b4c1f
Add Implicit Quantile Network (#371) 2021-05-29 09:44:23 +08:00
n+e
458028a326
fix docs (#373)
- fix css style error
- fix mujoco benchmark result
2021-05-23 12:43:03 +08:00
Yi Su
8f7bc65ac7
Add discrete Critic Regularized Regression (#367) 2021-05-19 13:29:56 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL (#359)
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00
Ark
84f58636eb
Make trainer resumable (#350)
- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer

Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-05-06 08:53:53 +08:00
ChenDRAG
bbc3c3e32d
Add numerical analysis tool and interactive plot (#341)
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-04-22 12:49:54 +08:00
ChenDRAG
844d7703c3
NPG Mujoco benchmark release (#347) 2021-04-21 16:31:20 +08:00
n+e
c059f98abf
fix atari_bcq (#345) 2021-04-20 22:59:21 +08:00
ChenDRAG
a57503c0aa
TRPO benchmark release (#340) 2021-04-19 17:05:06 +08:00
ChenDRAG
333b8fbd66
add plotter (#335)
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-04-14 14:06:36 +08:00
ChenDRAG
dd4a01132c
Fix SAC loss explode (#333)
* change SAC action_bound_method to "clip" (tanh is hardcoded in forward)

* docstring update

* modelbase -> modelbased
2021-04-04 17:33:35 +08:00
ChenDRAG
6426a39796
ppo benchmark (#330) 2021-03-30 11:50:35 +08:00
ChenDRAG
1730a9008a
A2C benchmark for mujoco (#325) 2021-03-28 13:12:43 +08:00
ChenDRAG
3ac67d9974
refactor A2C/PPO, change behavior of value normalization (#321) 2021-03-25 10:12:39 +08:00