471 Commits

Author SHA1 Message Date
Yifei Cheng
6fc6857812
Update Multi-agent RL docs, upgrade pettingzoo (#595)
* update multi-agent docs, upgrade pettingzoo

* avoid pettingzoo deprecation warning

* fix pistonball tests

* codestyle
2022-04-16 23:17:53 +08:00
Jiayi Weng
18277497ed
fix py39 ci venv test failure (#593) 2022-04-12 22:29:39 +08:00
ChenDRAG
75d7c9f1d9
Fix action scaling bug in SAC (#591)
close #588
2022-04-12 00:26:06 +08:00
Jiayi Weng
f13e415eb0
Add write_flush in two loggers, fix argument passing in WandbLogger (#581) 2022-03-30 08:04:23 +08:00
Jiayi Weng
6ab9860183
fix negative collector time (#578) 2022-03-26 10:44:08 +08:00
Jiayi Weng
2a9c9289e5
rename save_fn to save_best_fn to avoid ambiguity (#575)
This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.
v0.4.7
2022-03-22 04:29:27 +08:00
Jose Antonio Martin H
10d919052b
Add Trainers as generators (#559)
The new proposed feature is to have trainers as generators.
The usage pattern is:

```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
    print(f"Epoch: {epoch}")
    print(epoch_stat)
    print(info)
    do_something_with_policy()
    query_something_about_policy()
    make_a_plot_with(epoch_stat)
    display(info)
```

- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer

You can even iterate on several different trainers at the same time:

```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
    compare_results(result1, result2, ...)
```

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-18 00:26:14 +08:00
Andrea Boscolo Camiletto
2336a7db1b
fixed typo in rainbow DQN paper reference (#569)
* fixed typo in rainbow DQN paper ref

* fix gym==0.23 ci failure

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-16 21:38:51 +08:00
Minhui Li
39f8391cfb
Add map_action_inverse for fixing error of storing random action (#568)
(Issue #512) Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when the action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods.

This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.
2022-03-12 22:26:00 +08:00
Yi Su
9cb74e60c9
Add imitation baselines for offline RL (#566)
add imitation baselines for offline RL; make the choice of env/task and D4RL dataset explicit; on expert datasets, IL easily outperforms; after reading the D4RL paper, I'll rerun the exps on medium data
2022-03-12 21:33:54 +08:00
Alex Nikulkov
74f430ea36
Add a comment before SAC alpha loss (#565)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-09 06:38:42 +08:00
Chengqi Duan
ad2e1eaea0 Fix WandbLogger import error in Atari examples (#562) 2022-03-08 08:38:56 -05:00
Costa Huang
df3d7f582b
Update WandbLogger implementation (#558)
* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-07 06:40:47 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) (#550)
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)
2022-03-06 23:57:15 +08:00
Anas BELFADIL
d976a5aa91
Fixed hardcoded reward_treshold (#548) 2022-03-04 10:35:39 +08:00
Jiayi Weng
c248b4f87e
fix conda support and keep API compatibility (#536)
* loose constrains

* fix nni issue (#478)

* fix coverage
v0.4.6.post1
2022-02-26 00:05:02 +08:00
Yi Su
97df511a13
Add VizDoom PPO example and results (#533)
* update vizdoom ppo example

* update README with results
v0.4.6
2022-02-25 09:33:34 +08:00
Chengqi Duan
23fbc3b712
upgrade gym version to >=0.21, fix related CI and update examples/atari (#534)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-25 07:40:33 +08:00
Mohammad Mahdi Rahimi
c7e2e56fac
Pettingzoo support (#494)
Co-authored-by: Rodrigo de Lazcano <r.l.p.v96@gmail.com>
Co-authored-by: J K Terry <justinkterry@gmail.com>
2022-02-15 22:56:45 +08:00
Chengqi Duan
d85bc19269
update dqn tutorial and add envpool to docs (#526)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-15 06:39:47 +08:00
Yi Su
d29188ee77
update atari ppo slots (#529) 2022-02-13 04:04:21 +08:00
Yi Su
40289b8b0e
Add atari ppo example (#523)
I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in https://github.com/DLR-RM/rl-baselines3-zoo/issues/156.
2022-02-11 06:45:06 +08:00
Jiayi Weng
3d697aa4c6
make unit test faster (#522)
* test cache expert data in offline training

* faster cql test

* faster tests

* use dummy

* test ray dependency
2022-02-09 00:24:52 +08:00
Chengqi Duan
9c100e0705
Enable venvs.reset() concurrent execution (#517)
- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-08 00:40:01 +08:00
Kenneth Schröder
cd7654bfd5
Fixing casts to int by to_torch_as(...) calls in policies when using discrete actions (#521) 2022-02-07 03:42:46 +08:00
ChenDRAG
c25926dd8f
Formalize variable names (#509)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example (#506) 2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module (#503) 2022-01-15 02:43:48 +08:00
Markus28
a2d76d1276
Remove reset_buffer() from reset method (#501) 2022-01-12 16:46:28 -08:00
Yi Su
3592f45446
Fix critic network for Discrete CRR (#485)
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
v0.4.5
2021-11-28 23:10:28 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example (#480)
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Jiayi Weng
94d3b27db9
fix tqdm issue (#481) 2021-11-19 00:17:44 +08:00
Markus28
8f19a86966
Implements set_env_attr and get_env_attr for vector environments (#478)
close #473
2021-11-03 00:08:00 +08:00
Jiayi Weng
098d466467
fix atari wrapper to be deterministic (#467) 2021-10-19 22:26:11 +08:00
Jiayi Weng
b9eedc516e bump to 0.4.4 v0.4.4 2021-10-13 12:22:24 -04:00
Ayush Chaurasia
63d752ee0b
W&B: Add usage in the docs (#463) 2021-10-13 23:28:25 +08:00
Jiayi Weng
926ec0b9b1
update save_fn in trainer (#459)
- collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger)
- save_fn() will be called at the beginning of trainer
2021-10-13 21:25:24 +08:00
Jiayi Weng
e45e2096d8
add multi-GPU support (#461)
add a new class DataParallelNet
2021-10-06 01:39:14 +08:00
Jiayi Weng
5df64800f4
final fix for actor_critic shared head parameters (#458) 2021-10-04 23:19:07 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger (#441)
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
Jiayi Weng
e8f8cdfa41
fix logger.write error in atari script (#444)
- fix a bug in #427: logger.write should pass a dict
- change SubprocVectorEnv to ShmemVectorEnv in atari
- increase logger interval for eps
2021-09-09 00:51:39 +08:00
n+e
fc251ab0b8
bump to v0.4.3 (#432)
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
v0.4.3
2021-09-03 05:05:04 +08:00
Ending Hsiao
a740496a51
fix dual clip implementation (#435)
close #433
2021-09-02 21:43:14 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger (#427)
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
n+e
e4f4f0e144
fix docs build failure and a bug in a2c/ppo optimizer (#428)
* fix rtfd build

* list + list -> set.union

* change seed of test_qrdqn

* add py39 test
2021-08-30 02:07:03 +08:00
Yi Su
291be08d43
Add Rainbow DQN (#386)
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
Andriy Drozdyuk
d161059c3d
Replaced indice by plural indices (#422) 2021-08-20 21:58:44 +08:00
deeplook
728b88b92d
Fix conda install command (#419) 2021-08-16 18:56:01 +08:00
n+e
5b7732a29b
make ppo discrete test script more general (#418) 2021-08-15 21:37:37 +08:00
n+e
bba30f83d1
fix sb2's coverage (#412) 2021-08-10 17:43:27 +08:00