Costa Huang
df3d7f582b
Update WandbLogger implementation ( #558 )
...
* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-07 06:40:47 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) ( #550 )
...
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531 , #173 )
2022-03-06 23:57:15 +08:00
Anas BELFADIL
d976a5aa91
Fixed hardcoded reward_treshold ( #548 )
2022-03-04 10:35:39 +08:00
Jiayi Weng
c248b4f87e
fix conda support and keep API compatibility ( #536 )
...
* loose constrains
* fix nni issue (#478 )
* fix coverage
v0.4.6.post1
2022-02-26 00:05:02 +08:00
Yi Su
97df511a13
Add VizDoom PPO example and results ( #533 )
...
* update vizdoom ppo example
* update README with results
v0.4.6
2022-02-25 09:33:34 +08:00
Chengqi Duan
23fbc3b712
upgrade gym version to >=0.21, fix related CI and update examples/atari ( #534 )
...
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-25 07:40:33 +08:00
Mohammad Mahdi Rahimi
c7e2e56fac
Pettingzoo support ( #494 )
...
Co-authored-by: Rodrigo de Lazcano <r.l.p.v96@gmail.com>
Co-authored-by: J K Terry <justinkterry@gmail.com>
2022-02-15 22:56:45 +08:00
Chengqi Duan
d85bc19269
update dqn tutorial and add envpool to docs ( #526 )
...
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-15 06:39:47 +08:00
Yi Su
d29188ee77
update atari ppo slots ( #529 )
2022-02-13 04:04:21 +08:00
Yi Su
40289b8b0e
Add atari ppo example ( #523 )
...
I needed a policy gradient baseline myself and it has been requested several times (#497 , #374 , #440 ). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.
Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156 .
2022-02-11 06:45:06 +08:00
Jiayi Weng
3d697aa4c6
make unit test faster ( #522 )
...
* test cache expert data in offline training
* faster cql test
* faster tests
* use dummy
* test ray dependency
2022-02-09 00:24:52 +08:00
Chengqi Duan
9c100e0705
Enable venvs.reset() concurrent execution ( #517 )
...
- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-08 00:40:01 +08:00
Kenneth Schröder
cd7654bfd5
Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions ( #521 )
2022-02-07 03:42:46 +08:00
ChenDRAG
c25926dd8f
Formalize variable names ( #509 )
...
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example ( #506 )
2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module ( #503 )
2022-01-15 02:43:48 +08:00
Markus28
a2d76d1276
Remove reset_buffer() from reset method ( #501 )
2022-01-12 16:46:28 -08:00
Yi Su
3592f45446
Fix critic network for Discrete CRR ( #485 )
...
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
v0.4.5
2021-11-28 23:10:28 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example ( #480 )
...
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Jiayi Weng
94d3b27db9
fix tqdm issue ( #481 )
2021-11-19 00:17:44 +08:00
Markus28
8f19a86966
Implements set_env_attr and get_env_attr for vector environments ( #478 )
...
close #473
2021-11-03 00:08:00 +08:00
Jiayi Weng
098d466467
fix atari wrapper to be deterministic ( #467 )
2021-10-19 22:26:11 +08:00
Jiayi Weng
b9eedc516e
bump to 0.4.4
v0.4.4
2021-10-13 12:22:24 -04:00
Ayush Chaurasia
63d752ee0b
W&B: Add usage in the docs ( #463 )
2021-10-13 23:28:25 +08:00
Jiayi Weng
926ec0b9b1
update save_fn in trainer ( #459 )
...
- collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger)
- save_fn() will be called at the beginning of trainer
2021-10-13 21:25:24 +08:00
Jiayi Weng
e45e2096d8
add multi-GPU support ( #461 )
...
add a new class DataParallelNet
2021-10-06 01:39:14 +08:00
Jiayi Weng
5df64800f4
final fix for actor_critic shared head parameters ( #458 )
2021-10-04 23:19:07 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger ( #441 )
...
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
Jiayi Weng
e8f8cdfa41
fix logger.write error in atari script ( #444 )
...
- fix a bug in #427 : logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps
2021-09-09 00:51:39 +08:00
n+e
fc251ab0b8
bump to v0.4.3 ( #432 )
...
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
v0.4.3
2021-09-03 05:05:04 +08:00
Ending Hsiao
a740496a51
fix dual clip implementation ( #435 )
...
close #433
2021-09-02 21:43:14 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger ( #427 )
...
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
n+e
e4f4f0e144
fix docs build failure and a bug in a2c/ppo optimizer ( #428 )
...
* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test
2021-08-30 02:07:03 +08:00
Yi Su
291be08d43
Add Rainbow DQN ( #386 )
...
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
Andriy Drozdyuk
d161059c3d
Replaced indice by plural indices ( #422 )
2021-08-20 21:58:44 +08:00
deeplook
728b88b92d
Fix conda install command ( #419 )
2021-08-16 18:56:01 +08:00
n+e
5b7732a29b
make ppo discrete test script more general ( #418 )
2021-08-15 21:37:37 +08:00
n+e
bba30f83d1
fix sb2's coverage ( #412 )
2021-08-10 17:43:27 +08:00
Miguel Morales
42538f8e58
Update README.md ( #410 )
2021-08-10 09:14:20 +08:00
ChenDRAG
0674ff628a
Cite Tianshou's latest paper ( #406 )
...
* Cite Tianshou's latest paper
* update new version README
* change order
Co-authored-by: Jiayi Weng <wengj@sea.com>
2021-08-10 08:35:01 +08:00
Andriy Drozdyuk
18d2f25eff
Remove warnings about the use of save_fn across trainers ( #408 )
2021-08-04 09:56:00 +08:00
n+e
c19876179a
add env_id in preprocess fn ( #391 )
2021-07-05 09:50:39 +08:00
n+e
ebaca6f8da
add vizdoom example, bump version to 0.4.2 ( #384 )
v0.4.2
2021-06-26 18:08:41 +08:00
Yi Su
c0bc8e00ca
Add Fully-parameterized Quantile Function ( #376 )
2021-06-15 11:59:02 +08:00
Yi Su
21b2b22cd7
update iqn results and reward plots ( #377 )
2021-06-10 09:05:25 +08:00
Yi Su
f3169b4c1f
Add Implicit Quantile Network ( #371 )
2021-05-29 09:44:23 +08:00
n+e
458028a326
fix docs ( #373 )
...
- fix css style error
- fix mujoco benchmark result
2021-05-23 12:43:03 +08:00
Ark
655d5fb14f
Allow researchers to choose whether to use Double DQN ( #368 )
2021-05-21 10:53:34 +08:00
Yi Su
8f7bc65ac7
Add discrete Critic Regularized Regression ( #367 )
2021-05-19 13:29:56 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL ( #359 )
...
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00