100 Commits

Author SHA1 Message Date
Markus Krimmel
b0c8d28a7d
Added pre-commit (#752)
- This PR adds the checks that are defined in the Makefile as pre-commit
hooks.
- Hopefully, the checks are equivalent to those from the Makefile, but I
can't guarantee it.
- CI remains as it is.
- As I pointed out on discord, I experienced some conflicts between
flake8 and yapf, so it might be better to transition to some other
combination (e.g. black).
2022-10-02 08:57:45 -07:00
Jiayi Weng
278c91a222
Update citation and contributor (#721)
* update citation

* update contributor

* pass lint
2022-08-10 20:06:51 -07:00
Jiayi Weng
65054847ef
bump version to 0.4.9 (#684) 2022-07-05 01:07:16 +08:00
Yi Su
df35718992
Implement TD3+BC for offline RL (#660)
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
2022-06-07 00:39:37 +08:00
Anas BELFADIL
53e6b0408d
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 21:40:32 +08:00
Jiayi Weng
2a7c151738
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628)
- add VectorEnvWrapper and VectorEnvNormObs
- obs_rms store in policy save/load
- align mujoco scripts with atari: obs_norm, envpool, wandb and README
2022-05-05 19:55:15 +08:00
Yi Su
dd16818ce4
implement REDQ based on original contribution by @Jimenius (#623)
Co-authored-by: Minhui Li
 <limh@lamda.nju.edu.cn>
2022-05-01 00:06:00 +08:00
Jiayi Weng
18277497ed
fix py39 ci venv test failure (#593) 2022-04-12 22:29:39 +08:00
Yi Su
2377f2f186
Implement Generative Adversarial Imitation Learning (GAIL) (#550)
Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)
2022-03-06 23:57:15 +08:00
Chengqi Duan
d85bc19269
update dqn tutorial and add envpool to docs (#526)
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-02-15 06:39:47 +08:00
Bernard Tan
bc53ead273
Implement CQLPolicy and offline_cql example (#506) 2022-01-16 05:30:21 +08:00
Yi Su
a59d96d041
Add Intrinsic Curiosity Module (#503) 2022-01-15 02:43:48 +08:00
Bernard Tan
5c5a3db94e
Implement BCQPolicy and offline_bcq example (#480)
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
2021-11-22 22:21:02 +08:00
Ayush Chaurasia
63d752ee0b
W&B: Add usage in the docs (#463) 2021-10-13 23:28:25 +08:00
Jiayi Weng
e45e2096d8
add multi-GPU support (#461)
add a new class DataParallelNet
2021-10-06 01:39:14 +08:00
Ayush Chaurasia
22d7bf38c8
Improve W&B logger (#441)
- rename WandBLogger -> WandbLogger
- add save_data and restore_data
- allow more input arguments for wandb init
- integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py
- documentation update
2021-09-24 21:52:23 +08:00
Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger (#427)
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
Yi Su
291be08d43
Add Rainbow DQN (#386)
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
deeplook
728b88b92d
Fix conda install command (#419) 2021-08-16 18:56:01 +08:00
n+e
5b7732a29b
make ppo discrete test script more general (#418) 2021-08-15 21:37:37 +08:00
n+e
bba30f83d1
fix sb2's coverage (#412) 2021-08-10 17:43:27 +08:00
Miguel Morales
42538f8e58
Update README.md (#410) 2021-08-10 09:14:20 +08:00
ChenDRAG
0674ff628a
Cite Tianshou's latest paper (#406)
* Cite Tianshou's latest paper

* update new version README

* change order

Co-authored-by: Jiayi Weng <wengj@sea.com>
2021-08-10 08:35:01 +08:00
n+e
ebaca6f8da
add vizdoom example, bump version to 0.4.2 (#384) 2021-06-26 18:08:41 +08:00
Yi Su
c0bc8e00ca
Add Fully-parameterized Quantile Function (#376) 2021-06-15 11:59:02 +08:00
Yi Su
f3169b4c1f
Add Implicit Quantile Network (#371) 2021-05-29 09:44:23 +08:00
Yi Su
8f7bc65ac7
Add discrete Critic Regularized Regression (#367) 2021-05-19 13:29:56 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL (#359)
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00
ChenDRAG
1dcf65fe21
Add NPG policy (#344) 2021-04-21 09:52:15 +08:00
ChenDRAG
a57503c0aa
TRPO benchmark release (#340) 2021-04-19 17:05:06 +08:00
ChenDRAG
5057b5c89e
Add TRPO policy (#337) 2021-04-16 20:37:12 +08:00
ChenDRAG
6426a39796
ppo benchmark (#330) 2021-03-30 11:50:35 +08:00
n+e
8963a14327
fix exception in tutorials/dqn.rst (#327) 2021-03-26 12:57:00 +08:00
ChenDRAG
9b61bc620c add logger (#295)
This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally.

Things changed:

1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer;
2. remove utils.SummaryWriter;
2021-02-24 14:48:42 +08:00
ChenDRAG
7036073649
Trainer refactor : some definition change (#293)
This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.
2021-02-21 13:06:02 +08:00
ChenDRAG
150d0ec51b
Step collector implementation (#280)
This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail.

Things changed in this PR:

1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv;
2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.)
3. add policy.exploration_noise(act, batch) -> act
4. small change in BasePolicy.compute_*_returns
5. move reward_metric from collector to trainer
6. fix np.asanyarray issue (different version's numpy will result in different output)
7. flake8 maxlength=88
8. polish docs and fix test

Co-authored-by: n+e <trinkle23897@gmail.com>
2021-02-19 10:33:49 +08:00
wizardsheng
1eb6137645
Add QR-DQN algorithm (#276)
This is the PR for QR-DQN algorithm: https://arxiv.org/abs/1710.10044

1. add QR-DQN policy in tianshou/policy/modelfree/qrdqn.py.
2. add QR-DQN net in examples/atari/atari_network.py.
3. add QR-DQN atari example in examples/atari/atari_qrdqn.py.
4. add QR-DQN statement in tianshou/policy/init.py.
5. add QR-DQN unit test in test/discrete/test_qrdqn.py.
6. add QR-DQN atari results in examples/atari/results/qrdqn/.
7. add compute_q_value in DQNPolicy and C51Policy for simplify forward function.
8. move `with torch.no_grad():` from `_target_q` to BasePolicy

By running "python3 atari_qrdqn.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '19.8 ± 0.40', in epoch 8.
2021-01-28 09:27:05 +08:00
Jialu Zhu
a511cb4779
Add offline trainer and discrete BCQ algorithm (#263)
The result needs to be tuned after `done` issue fixed.

Co-authored-by: n+e <trinkle23897@gmail.com>
2021-01-20 18:13:04 +08:00
ChenDRAG
a633a6a028
update utils.network (#275)
This is the first commit of 6 commits mentioned in #274, which features

1. Refactor of `Class Net` to support any form of MLP.
2. Enable type check in utils.network.
3. Relative change in docs/test/examples.
4. Move atari-related network to examples/atari/atari_network.py

Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-01-20 16:54:13 +08:00
蔡舒起
866e35d550
fix readme (#273) 2021-01-16 19:27:35 +08:00
wizardsheng
c6f2648e87
Add C51 algorithm (#266)
This is the PR for C51algorithm: https://arxiv.org/abs/1707.06887

1. add C51 policy in tianshou/policy/modelfree/c51.py.
2. add C51 net in tianshou/utils/net/discrete.py.
3. add C51 atari example in examples/atari/atari_c51.py.
4. add C51 statement in tianshou/policy/__init__.py.
5. add C51 test in test/discrete/test_c51.py.
6. add C51 atari results in examples/atari/results/c51/.

By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get  best_result': '20.50 ± 0.50', in epoch 9.

By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.
2021-01-06 10:17:45 +08:00
n+e
710966eda7
change API of train_fn and test_fn (#229)
train_fn(epoch) -> train_fn(epoch, num_env_step)
test_fn(epoch) -> test_fn(epoch, num_env_step)
2020-09-26 16:35:37 +08:00
n+e
d87d31a705
Update Anaconda support (#228)
conda install -c conda-forge tianshou
Related PR: conda-forge/staged-recipes#12719
2020-09-25 15:07:36 +08:00
Yao Feng
dcfcbb37f4
add PSRL policy (#202)
Add PSRL policy in tianshou/policy/modelbase/psrl.py.

Co-authored-by: n+e <trinkle23897@cmu.edu>
2020-09-23 20:57:33 +08:00
rocknamx
bf39b9ef7d
clarify updating state (#224)
Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy.
Meanwhile, the state of `self.training` will be undisputed in the training stage.
Related issue: #211 

Others:
- fix a bug in DDQN: target_q could not be sampled from np.random.rand
- fix a bug in DQN atari net: it should add a ReLU before the last layer
- fix a bug in collector timing

Co-authored-by: n+e <463003665@qq.com>
2020-09-22 16:28:46 +08:00
n+e
b86d78766b
fix docs and add docstring check (#210)
- fix broken links and out-of-the-date content
- add pydocstyle and doc8 check
- remove collector.seed and collector.render
2020-09-11 07:55:37 +08:00
n+e
64af7ea839
fix critical bugs in MAPolicy and docs update (#207)
- fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy)
- polish examples/box2d/bipedal_hardcore_sac.py
- several docs update
- format setup.py and bump version to 0.2.7
2020-09-08 21:10:48 +08:00
n+e
8bb8ecba6e
set policy.eval() before collector.collect (#204)
* fix #203

* no_grad argument in collector.collect
2020-09-06 16:20:16 +08:00
Trinkle23897
34f714a677 Numba acceleration (#193)
Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170

since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)

Others:
- fix a bug in ttt set_eps
- keep only sumtree in segment tree implementation
- dirty fix for asyncVenv check_id test
2020-09-02 13:03:32 +08:00
yingchengyang
5b49192a48
DQN Atari examples (#187)
This PR aims to provide the script of Atari DQN setting:
- A speedrun of PongNoFrameskip-v4 (finished, about half an hour in i7-8750 + GTX1060 with 1M environment steps)
- A general script for all atari game
Since we use multiple env for simulation, the result is slightly different from the original paper, but consider to be acceptable.

It also adds another parameter save_only_last_obs for replay buffer in order to save the memory.

Co-authored-by: Trinkle23897 <463003665@qq.com>
2020-08-30 05:48:09 +08:00