Jiayi Weng
278c91a222
Update citation and contributor ( #721 )
...
* update citation
* update contributor
* pass lint
2022-08-10 20:06:51 -07:00
Wenhao Chen
f270e88461
Do not allow async simulation for test collector ( #705 )
2022-07-22 16:23:55 -07:00
Jiayi Weng
99c99bb09a
Fix 2 bugs and refactor RunningMeanStd to support dict obs norm ( #695 )
...
* fix #689
* fix #672
* refactor RMS class
* fix #688
2022-07-14 22:52:56 -07:00
Yi Su
df35718992
Implement TD3+BC for offline RL ( #660 )
...
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
2022-06-07 00:39:37 +08:00
Jiayi Weng
5ecea2402e
Fix save_checkpoint_fn return value ( #659 )
...
- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator.
2022-06-03 01:07:07 +08:00
Jiayi Weng
6ad5b520fa
Fix sphinx build error ( #655 )
2022-06-01 13:56:04 +08:00
Anas BELFADIL
53e6b0408d
Add BranchingDQN for large discrete action spaces ( #618 )
2022-05-15 21:40:32 +08:00
Jiayi Weng
bf8f63ffc3
use envpool in vizdoom example, update doc ( #634 )
2022-05-09 00:42:16 +08:00
Yi Su
dd16818ce4
implement REDQ based on original contribution by @Jimenius ( #623 )
...
Co-authored-by: Minhui Li
<limh@lamda.nju.edu.cn>
2022-05-01 00:06:00 +08:00
ChenDRAG
7f23748347
Compare Atari results with dopamine and OpenAI Baselines ( #616 )
2022-04-27 21:10:45 +08:00
Jiayi Weng
876e6b186e
hot fix mujoco benchmark
2022-04-24 16:49:40 -04:00
Chengqi Duan
5eab7dc218
Add Atari Results ( #600 )
2022-04-24 20:44:54 +08:00
ChenDRAG
5c9afe72f3
Update Mujoco Bemchmark's webpage ( #606 )
2022-04-24 01:11:33 +08:00
ChenDRAG
57ecebde38
Add jupyter notebook tutorials using Google Colaboratory ( #599 )
2022-04-19 20:58:52 +08:00
Alex Nikulkov
92456cdb68
Add learning rate scheduler to BasePolicy ( #598 )
2022-04-17 23:52:30 +08:00
Yifei Cheng
6fc6857812
Update Multi-agent RL docs, upgrade pettingzoo ( #595 )
...
* update multi-agent docs, upgrade pettingzoo
* avoid pettingzoo deprecation warning
* fix pistonball tests
* codestyle
2022-04-16 23:17:53 +08:00
Jiayi Weng
2a9c9289e5
rename save_fn to save_best_fn to avoid ambiguity ( #575 )
...
This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.
2022-03-22 04:29:27 +08:00
Jose Antonio Martin H
10d919052b
Add Trainers as generators ( #559 )
...
The new proposed feature is to have trainers as generators.
The usage pattern is:
```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
print(f"Epoch: {epoch}")
print(epoch_stat)
print(info)
do_something_with_policy()
query_something_about_policy()
make_a_plot_with(epoch_stat)
display(info)
```
- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer
You can even iterate on several different trainers at the same time:
```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
compare_results(result1, result2, ...)
```
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-18 00:26:14 +08:00
Andrea Boscolo Camiletto
2336a7db1b
fixed typo in rainbow DQN paper reference ( #569 )
...
* fixed typo in rainbow DQN paper ref
* fix gym==0.23 ci failure
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-16 21:38:51 +08:00
Costa Huang
df3d7f582b
Update WandbLogger implementation ( #558 )
...
* Use `global_step` as the x-axis for wandb
* Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)`
* Update all atari examples with wandb
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-07 06:40:47 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) ( #550 )
...
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531 , #173 )
2022-03-06 23:57:15 +08:00
Chengqi Duan
d85bc19269
update dqn tutorial and add envpool to docs ( #526 )
...
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-15 06:39:47 +08:00
Chengqi Duan
9c100e0705
Enable venvs.reset() concurrent execution ( #517 )
...
- change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool)
- add a timing test for venvs.reset() to make sure the concurrent execution
- change venvs.reset() logic
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-08 00:40:01 +08:00
ChenDRAG
c25926dd8f
Formalize variable names ( #509 )
...
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-01-30 00:53:56 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example ( #506 )
2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module ( #503 )
2022-01-15 02:43:48 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example ( #480 )
...
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Ayush Chaurasia
63d752ee0b
W&B: Add usage in the docs ( #463 )
2021-10-13 23:28:25 +08:00
Jiayi Weng
e45e2096d8
add multi-GPU support ( #461 )
...
add a new class DataParallelNet
2021-10-06 01:39:14 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger ( #441 )
...
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
n+e
fc251ab0b8
bump to v0.4.3 ( #432 )
...
* add makefile
* bump version
* add isort and yapf
* update contributing.md
* update PR template
* spelling check
2021-09-03 05:05:04 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger ( #427 )
...
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
n+e
e4f4f0e144
fix docs build failure and a bug in a2c/ppo optimizer ( #428 )
...
* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test
2021-08-30 02:07:03 +08:00
Yi Su
291be08d43
Add Rainbow DQN ( #386 )
...
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
Andriy Drozdyuk
d161059c3d
Replaced indice by plural indices ( #422 )
2021-08-20 21:58:44 +08:00
n+e
c19876179a
add env_id in preprocess fn ( #391 )
2021-07-05 09:50:39 +08:00
Yi Su
c0bc8e00ca
Add Fully-parameterized Quantile Function ( #376 )
2021-06-15 11:59:02 +08:00
Yi Su
f3169b4c1f
Add Implicit Quantile Network ( #371 )
2021-05-29 09:44:23 +08:00
n+e
458028a326
fix docs ( #373 )
...
- fix css style error
- fix mujoco benchmark result
2021-05-23 12:43:03 +08:00
Yi Su
8f7bc65ac7
Add discrete Critic Regularized Regression ( #367 )
2021-05-19 13:29:56 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL ( #359 )
...
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00
Ark
84f58636eb
Make trainer resumable ( #350 )
...
- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-05-06 08:53:53 +08:00
n+e
ff4d3cd714
Support different state size and fix exception in venv.__del__ ( #352 )
...
- Batch: do not raise error when it finds list of np.array with different shape[0].
- Venv's obs: add try...except block for np.stack(obs_list)
- remove venv.__del__ since it is buggy
2021-04-25 15:23:46 +08:00
ChenDRAG
bbc3c3e32d
Add numerical analysis tool and interactive plot ( #341 )
...
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-04-22 12:49:54 +08:00
ChenDRAG
1dcf65fe21
Add NPG policy ( #344 )
2021-04-21 09:52:15 +08:00
ChenDRAG
5057b5c89e
Add TRPO policy ( #337 )
2021-04-16 20:37:12 +08:00
ChenDRAG
6426a39796
ppo benchmark ( #330 )
2021-03-30 11:50:35 +08:00
n+e
8963a14327
fix exception in tutorials/dqn.rst ( #327 )
2021-03-26 12:57:00 +08:00
n+e
0c7117dd55
fix concepts.rst with regard to new buffer behavior ( #316 )
...
fix #315
2021-03-20 21:46:36 +08:00
n+e
454c86c469
fix venv seed, add TOC in docs, and split buffer.py into several files ( #303 )
...
Things changed in this PR:
- various docs update, add TOC
- split buffer into several files
- fix venv action_space randomness
2021-03-02 12:28:28 +08:00