Andriy Drozdyuk
8a5e2190f7
Add Weights and Biases Logger ( #427 )
...
- rename BasicLogger to TensorboardLogger
- refactor logger code
- add WandbLogger
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2021-08-30 22:35:02 +08:00
n+e
e4f4f0e144
fix docs build failure and a bug in a2c/ppo optimizer ( #428 )
...
* fix rtfd build
* list + list -> set.union
* change seed of test_qrdqn
* add py39 test
2021-08-30 02:07:03 +08:00
Yi Su
291be08d43
Add Rainbow DQN ( #386 )
...
- add RainbowPolicy
- add `set_beta` method in prio_buffer
- add NoisyLinear in utils/network
2021-08-29 23:34:59 +08:00
Andriy Drozdyuk
d161059c3d
Replaced indice by plural indices ( #422 )
2021-08-20 21:58:44 +08:00
deeplook
728b88b92d
Fix conda install command ( #419 )
2021-08-16 18:56:01 +08:00
n+e
5b7732a29b
make ppo discrete test script more general ( #418 )
2021-08-15 21:37:37 +08:00
n+e
bba30f83d1
fix sb2's coverage ( #412 )
2021-08-10 17:43:27 +08:00
Miguel Morales
42538f8e58
Update README.md ( #410 )
2021-08-10 09:14:20 +08:00
ChenDRAG
0674ff628a
Cite Tianshou's latest paper ( #406 )
...
* Cite Tianshou's latest paper
* update new version README
* change order
Co-authored-by: Jiayi Weng <wengj@sea.com>
2021-08-10 08:35:01 +08:00
Andriy Drozdyuk
18d2f25eff
Remove warnings about the use of save_fn across trainers ( #408 )
2021-08-04 09:56:00 +08:00
n+e
c19876179a
add env_id in preprocess fn ( #391 )
2021-07-05 09:50:39 +08:00
n+e
ebaca6f8da
add vizdoom example, bump version to 0.4.2 ( #384 )
v0.4.2
2021-06-26 18:08:41 +08:00
Yi Su
c0bc8e00ca
Add Fully-parameterized Quantile Function ( #376 )
2021-06-15 11:59:02 +08:00
Yi Su
21b2b22cd7
update iqn results and reward plots ( #377 )
2021-06-10 09:05:25 +08:00
Yi Su
f3169b4c1f
Add Implicit Quantile Network ( #371 )
2021-05-29 09:44:23 +08:00
n+e
458028a326
fix docs ( #373 )
...
- fix css style error
- fix mujoco benchmark result
2021-05-23 12:43:03 +08:00
Ark
655d5fb14f
Allow researchers to choose whether to use Double DQN ( #368 )
2021-05-21 10:53:34 +08:00
Yi Su
8f7bc65ac7
Add discrete Critic Regularized Regression ( #367 )
2021-05-19 13:29:56 +08:00
Yi Su
b5c3ddabfa
Add discrete Conservative Q-Learning for offline RL ( #359 )
...
Co-authored-by: Yi Su <yi.su@antgroup.com>
Co-authored-by: Yi Su <yi.su@antfin.com>
2021-05-12 09:24:48 +08:00
Ark
84f58636eb
Make trainer resumable ( #350 )
...
- specify tensorboard >= 2.5.0
- add `save_checkpoint_fn` and `resume_from_log` in trainer
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-05-06 08:53:53 +08:00
Yuge Zhang
f4e05d585a
Support deterministic evaluation for onpolicy algorithms ( #354 )
2021-04-27 21:22:39 +08:00
n+e
ff4d3cd714
Support different state size and fix exception in venv.__del__ ( #352 )
...
- Batch: do not raise error when it finds list of np.array with different shape[0].
- Venv's obs: add try...except block for np.stack(obs_list)
- remove venv.__del__ since it is buggy
2021-04-25 15:23:46 +08:00
ChenDRAG
bbc3c3e32d
Add numerical analysis tool and interactive plot ( #341 )
...
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-04-22 12:49:54 +08:00
ChenDRAG
844d7703c3
NPG Mujoco benchmark release ( #347 )
2021-04-21 16:31:20 +08:00
ChenDRAG
1dcf65fe21
Add NPG policy ( #344 )
2021-04-21 09:52:15 +08:00
n+e
c059f98abf
fix atari_bcq ( #345 )
2021-04-20 22:59:21 +08:00
ChenDRAG
a57503c0aa
TRPO benchmark release ( #340 )
2021-04-19 17:05:06 +08:00
n+e
f68cb78ed7
Add self-hosted runner for GPU checks ( #339 )
2021-04-18 16:57:37 +08:00
ChenDRAG
5057b5c89e
Add TRPO policy ( #337 )
2021-04-16 20:37:12 +08:00
ChenDRAG
333b8fbd66
add plotter ( #335 )
...
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-04-14 14:06:36 +08:00
ChenDRAG
dd4a01132c
Fix SAC loss explode ( #333 )
...
* change SAC action_bound_method to "clip" (tanh is hardcoded in forward)
* docstring update
* modelbase -> modelbased
v0.4.1
2021-04-04 17:33:35 +08:00
n+e
825da9bc53
add cross-platform test and release 0.4.1 ( #331 )
...
* bump to 0.4.1
* add cross-platform test
2021-03-31 15:14:22 +08:00
n+e
09692c84fe
fix numpy>=1.20 typing check ( #323 )
...
Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).
2021-03-30 16:06:03 +08:00
ChenDRAG
6426a39796
ppo benchmark ( #330 )
2021-03-30 11:50:35 +08:00
ChenDRAG
5d580c3662
refactor ppo ( #329 )
2021-03-28 18:28:36 +08:00
ChenDRAG
1730a9008a
A2C benchmark for mujoco ( #325 )
2021-03-28 13:12:43 +08:00
ChenDRAG
105b277b87
hotfix:keep statisics of buffer when reset buffer in on policy trainer ( #328 )
2021-03-27 16:58:48 +08:00
n+e
8963a14327
fix exception in tutorials/dqn.rst ( #327 )
2021-03-26 12:57:00 +08:00
Yuge Zhang
7db21f3df6
Test on finite vector env ( #324 )
...
add test/base/test_env_finite.py
2021-03-25 22:59:34 +08:00
ChenDRAG
3ac67d9974
refactor A2C/PPO, change behavior of value normalization ( #321 )
2021-03-25 10:12:39 +08:00
ChenDRAG
47c77899d5
Add REINFORCE benchmark for mujoco ( #320 )
2021-03-24 19:59:53 +08:00
ChenDRAG
e27b5a26f3
Refactor PG algorithm and change behavior of compute_episodic_return
( #319 )
...
- simplify code
- apply value normalization (global) and adv norm (per-batch) in on-policy algorithms
2021-03-23 22:05:48 +08:00
ChenDRAG
2c11b6e43b
Add lr_scheduler option for Onpolicy algorithm ( #318 )
...
add lr_scheduler option in PGPolicy/A2CPolicy/PPOPolicy
2021-03-22 16:57:24 +08:00
ChenDRAG
4d92952a7b
Remap action to fit gym's action space ( #313 )
...
Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
2021-03-21 16:45:50 +08:00
n+e
0c7117dd55
fix concepts.rst with regard to new buffer behavior ( #316 )
...
fix #315
2021-03-20 21:46:36 +08:00
n+e
ec23c7efe9
fix qvalue mask_action error for obs_next ( #310 )
...
* fix #309
* remove for-loop in dqn expl_noise
2021-03-15 08:06:24 +08:00
ChenDRAG
243ab43b3c
support observation normalization in BaseVectorEnv ( #308 )
...
add RunningMeanStd
2021-03-11 20:50:20 +08:00
ChenDRAG
5c53f8c1f8
fix reward_metric & n_episode bug in on policy algorithm ( #306 )
2021-03-08 14:35:30 +08:00
ChenDRAG
e605bdea94
MuJoCo Benchmark - DDPG, TD3, SAC ( #305 )
...
Releasing Tianshou's SOTA benchmark of 9 out of 13 environments from the MuJoCo Gym task suite.
2021-03-07 19:21:02 +08:00
n+e
389bdb7ed3
Merge pull request #302 from thu-ml/dev
...
v0.4.0
v0.4.0
2021-03-02 20:28:29 +08:00