Tianshou

hongshaorou/Tianshou

Fork 0

Commit Graph

Select branches

Hide Pull Requests

feature/algo-eval

master

priv

v0.2.1

v0.2.2

v0.2.3

v0.2.4

v0.2.4.post1

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.0.post1

v0.3.0rc0

v0.3.1

v0.3.2

v0.4.0

v0.4.1

v0.4.10

v0.4.11

v0.4.2

v0.4.3

v0.4.4

v0.4.5

v0.4.6

v0.4.6.post1

v0.4.7

v0.4.8

v0.4.9

v0.5.0

v1.0.0

b5c3ddabfa

Add discrete Conservative Q-Learning for offline RL (#359) Yi Su 2021-05-11 18:24:48 -07:00
84f58636eb

Make trainer resumable (#350) Ark 2021-05-06 08:53:53 +08:00
f4e05d585a

Support deterministic evaluation for onpolicy algorithms (#354) Yuge Zhang 2021-04-27 21:22:39 +08:00
ff4d3cd714

Support different state size and fix exception in venv.__del__ (#352) n+e 2021-04-25 15:23:46 +08:00
bbc3c3e32d

Add numerical analysis tool and interactive plot (#341) ChenDRAG 2021-04-22 12:49:54 +08:00
844d7703c3

NPG Mujoco benchmark release (#347) ChenDRAG 2021-04-21 16:31:20 +08:00
1dcf65fe21

Add NPG policy (#344) ChenDRAG 2021-04-21 09:52:15 +08:00
c059f98abf

fix atari_bcq (#345) n+e 2021-04-20 22:59:21 +08:00
a57503c0aa

TRPO benchmark release (#340) ChenDRAG 2021-04-19 17:05:06 +08:00
f68cb78ed7

Add self-hosted runner for GPU checks (#339) n+e 2021-04-18 16:57:37 +08:00
5057b5c89e

Add TRPO policy (#337) ChenDRAG 2021-04-16 20:37:12 +08:00
333b8fbd66

add plotter (#335) ChenDRAG 2021-04-14 14:06:36 +08:00
dd4a01132c

Fix SAC loss explode (#333) v0.4.1 ChenDRAG 2021-04-04 17:33:35 +08:00
825da9bc53

add cross-platform test and release 0.4.1 (#331) n+e 2021-03-31 15:14:22 +08:00
09692c84fe

fix numpy>=1.20 typing check (#323) n+e 2021-03-30 16:06:03 +08:00
6426a39796

ppo benchmark (#330) ChenDRAG 2021-03-30 11:50:35 +08:00
5d580c3662

refactor ppo (#329) ChenDRAG 2021-03-28 18:28:36 +08:00
1730a9008a

A2C benchmark for mujoco (#325) ChenDRAG 2021-03-28 13:12:43 +08:00
105b277b87

hotfix:keep statisics of buffer when reset buffer in on policy trainer (#328) ChenDRAG 2021-03-27 16:58:48 +08:00
8963a14327

fix exception in tutorials/dqn.rst (#327) n+e 2021-03-26 12:57:00 +08:00
7db21f3df6

Test on finite vector env (#324) Yuge Zhang 2021-03-25 22:59:34 +08:00
3ac67d9974

refactor A2C/PPO, change behavior of value normalization (#321) ChenDRAG 2021-03-25 10:12:39 +08:00
47c77899d5

Add REINFORCE benchmark for mujoco (#320) ChenDRAG 2021-03-24 19:59:53 +08:00
e27b5a26f3

Refactor PG algorithm and change behavior of compute_episodic_return (#319) ChenDRAG 2021-03-23 22:05:48 +08:00
2c11b6e43b

Add lr_scheduler option for Onpolicy algorithm (#318) ChenDRAG 2021-03-22 16:57:24 +08:00
4d92952a7b

Remap action to fit gym's action space (#313) ChenDRAG 2021-03-21 16:45:50 +08:00
0c7117dd55

fix concepts.rst with regard to new buffer behavior (#316) n+e 2021-03-20 21:46:36 +08:00
ec23c7efe9

fix qvalue mask_action error for obs_next (#310) n+e 2021-03-15 08:06:24 +08:00
243ab43b3c

support observation normalization in BaseVectorEnv (#308) ChenDRAG 2021-03-11 20:50:20 +08:00
5c53f8c1f8

fix reward_metric & n_episode bug in on policy algorithm (#306) ChenDRAG 2021-03-08 14:35:30 +08:00
e605bdea94

MuJoCo Benchmark - DDPG, TD3, SAC (#305) ChenDRAG 2021-03-07 19:21:02 +08:00
389bdb7ed3

Merge pull request #302 from thu-ml/dev v0.4.0 n+e 2021-03-02 20:28:29 +08:00
454c86c469

fix venv seed, add TOC in docs, and split buffer.py into several files (#303) n+e 2021-03-02 12:28:28 +08:00
31e7f445d1

fix vecenv action_space randomness (#300) n+e 2021-03-01 15:44:03 +08:00
f22b539761

Remove reward_normaliztion option in offpolicy algorithm (#298) ChenDRAG 2021-02-27 11:20:43 +08:00
3108b9db0d

Add Timelimit trick to optimize policies (#296) ChenDRAG 2021-02-26 13:23:18 +08:00
9b61bc620c add logger (#295) ChenDRAG 2021-02-24 14:48:42 +08:00
e99e1b0fdd Improve buffer.prev() & buffer.next() (#294) Trinkle23897 2021-02-22 19:19:22 +08:00
7036073649

Trainer refactor : some definition change (#293) ChenDRAG 2021-02-21 13:06:02 +08:00
150d0ec51b

Step collector implementation (#280) ChenDRAG 2021-02-19 10:33:49 +08:00
d918022ce9 merge master into dev Trinkle23897 2021-02-18 12:46:55 +08:00
cb65b56b13

v0.3.2 (#292) v0.3.2 n+e 2021-02-16 09:31:46 +08:00
d003c8e566

fix 2 bugs of batch (#284) n+e 2021-02-16 09:01:54 +08:00
f528131da1

hotfix：fix test failure in cuda environment (#289) ChenDRAG 2021-02-09 17:13:40 +08:00
e3ee415b1a temporary fix numpy<1.20.0 (#281) Trinkle23897 2021-02-08 12:59:37 +08:00
c838f2f0e9

fix 2 bugs of batch (#284) n+e 2021-02-02 19:28:05 +08:00
f0129f4ca7

Add CachedReplayBuffer and ReplayBufferManager (#278) ChenDRAG 2021-01-29 12:23:18 +08:00
1eb6137645

Add QR-DQN algorithm (#276) wizardsheng 2021-01-28 09:27:05 +08:00
a511cb4779

Add offline trainer and discrete BCQ algorithm (#263) v0.3.1 Jialu Zhu 2021-01-20 02:13:04 -08:00
a633a6a028

update utils.network (#275) ChenDRAG 2021-01-20 16:54:13 +08:00
866e35d550

fix readme (#273) 蔡舒起 2021-01-16 19:27:35 +08:00
c6f2648e87

Add C51 algorithm (#266) wizardsheng 2021-01-06 10:17:45 +08:00
5d13d8a453

Saving and loading replay buffer with HDF5 (#261) Nico Gürtler 2020-12-17 01:58:43 +01:00
cd481423dc sac mujoco result (#246) Trinkle23897 2020-11-09 16:43:55 +08:00
c97aa4065e

add singleton pattern version of summary_writter (#230) rocknamx 2020-10-31 16:38:54 +08:00
b364f1a26f specify the meaning of logits in documentation (#238) v0.3.0.post1 Trinkle23897 2020-10-08 23:16:15 +08:00
5ed6c1c7aa

change the step in trainer (#235) n+e 2020-10-04 21:55:43 +08:00
710966eda7

change API of train_fn and test_fn (#229) v0.3.0 n+e 2020-09-26 16:35:37 +08:00
d87d31a705

Update Anaconda support (#228) n+e 2020-09-25 15:07:36 +08:00
83bd1ec9e2

Add MANIFEST.in to include license file in source distribution (#227) Joshua Adelman 2020-09-24 20:15:24 -04:00
dcfcbb37f4

add PSRL policy (#202) v0.3.0rc0 Yao Feng 2020-09-23 20:57:33 +08:00
bf39b9ef7d

clarify updating state (#224) rocknamx 2020-09-22 16:28:46 +08:00
eec0826fd3

change PER update interface in BasePolicy (#217) n+e 2020-09-16 17:43:19 +08:00
623bf24f0c

fix unittest (#218) n+e 2020-09-14 15:59:32 +08:00
a6ee979609

implement sac for discrete action settings (#216) danagi 2020-09-14 14:59:23 +08:00
b284ace102

type check in unit test (#200) n+e 2020-09-13 19:31:50 +08:00
c91def6cbc

code format and update function signatures (#213) n+e 2020-09-12 15:39:01 +08:00
16d8e9b051

SAC implementation update (#212) danagi 2020-09-12 08:44:50 +08:00
b86d78766b

fix docs and add docstring check (#210) n+e 2020-09-11 07:55:37 +08:00
64af7ea839

fix critical bugs in MAPolicy and docs update (#207) v0.2.7 n+e 2020-09-08 21:10:48 +08:00
380e9e911d

fix atari examples (#206) n+e 2020-09-06 23:05:33 +08:00
8bb8ecba6e

set policy.eval() before collector.collect (#204) n+e 2020-09-06 16:20:16 +08:00
34f714a677 Numba acceleration (#193) Trinkle23897 2020-09-02 13:03:32 +08:00
5b49192a48

DQN Atari examples (#187) yingchengyang 2020-08-30 05:48:09 +08:00
94bfb32cc1

optimize training procedure and improve code coverage (#189) n+e 2020-08-27 12:15:18 +08:00
a9f9940d17

code refactor for venv (#179) v0.2.6 youkaichao 2020-08-19 15:00:24 +08:00
311a2beafb

Pickle compatible for replay buffer and improve buffer.get (#182) n+e 2020-08-16 16:26:23 +08:00
7f3b817b24

add policy.update to enable post process and remove collector.sample (#180) youkaichao 2020-08-15 16:10:42 +08:00
140b1c2cab

Improve PER (#159) n+e 2020-08-06 10:26:24 +08:00
312b7551cc

Add BipedalWalkerHardcore-v3 SAC example (#177) Imone 2020-08-05 10:29:41 +08:00
f2bcc55a25

ShmemVectorEnv Implementation (#174) ChenDRAG 2020-08-04 13:39:05 +08:00
996e2f7c9b

Add profile workflow (#143) ChenDRAG 2020-08-02 18:24:40 +08:00
32df0567bb

use nn.Sequential in DQN (#176) youkaichao 2020-08-02 15:14:44 +08:00
99a1d40e85

Dueling DQN (#170) yingchengyang 2020-07-29 19:44:42 +08:00
ad395b5235

bugfix for test_async_env (#171) youkaichao 2020-07-28 20:06:01 +08:00
b7a4015db7 doc update and do not force save 'policy' in np format (#168) Trinkle23897 2020-07-27 16:54:14 +08:00
e024afab8c

Asynchronous sampling vector environment (#134) Alexis DUBURCQ 2020-07-26 12:01:21 +02:00
30368c29a6

Replay buffer allows stack_num = 1 (#165) Alexis DUBURCQ 2020-07-25 13:33:44 +02:00
38a95c19da

Yet another 3 fix (#160) n+e 2020-07-24 17:38:12 +08:00
bfeffe1f97

unify single-env and multi-env in collector (#157) youkaichao 2020-07-23 16:40:53 +08:00
352a518399

3 fix (#158) n+e 2020-07-23 15:12:02 +08:00
bd9c3c7f8d

docs fix and v0.2.5 (#156) v0.2.5 n+e 2020-07-22 14:42:08 +08:00
089b85b6a2

Fix shape inconsistency in A2CPolicy and PPOPolicy (#155) n+e 2020-07-21 22:24:06 +08:00
865ef6c693

Improve to_torch/to_numpy converters (#147) Alexis DUBURCQ 2020-07-21 10:47:56 +02:00
8c32d99c65

Add multi-agent example: tic-tac-toe (#122) youkaichao 2020-07-21 14:59:49 +08:00
d09b69e594

buffer update bug fix (#154) ChenDRAG 2020-07-20 22:12:57 +08:00
fe5555d2a1 write tutorials to specify the standard of Batch (#142) youkaichao 2020-07-19 15:20:35 +08:00
3a08e27ed4 Standardized behavior of Batch.cat and misc code refactor (#137) youkaichao 2020-07-16 19:36:32 +08:00
09e10e384f Vector env enable select worker (#132) Alexis DUBURCQ 2020-07-13 16:38:42 +02:00
26fb87433d Improve collector (#125) v0.2.4.post1 youkaichao 2020-07-13 00:24:31 +08:00