Tianshou

Author	SHA1	Message	Date
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Ark	655d5fb14f	Allow researchers to choose whether to use Double DQN (#368 )	2021-05-21 10:53:34 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
Yuge Zhang	f4e05d585a	Support deterministic evaluation for onpolicy algorithms (#354 )	2021-04-27 21:22:39 +08:00
n+e	ff4d3cd714	Support different state size and fix exception in venv.__del__ (#352 ) - Batch: do not raise error when it finds list of np.array with different shape[0]. - Venv's obs: add try...except block for np.stack(obs_list) - remove venv.__del__ since it is buggy	2021-04-25 15:23:46 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	844d7703c3	NPG Mujoco benchmark release (#347 )	2021-04-21 16:31:20 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
n+e	c059f98abf	fix atari_bcq (#345 )	2021-04-20 22:59:21 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
n+e	f68cb78ed7	Add self-hosted runner for GPU checks (#339 )	2021-04-18 16:57:37 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	333b8fbd66	add plotter (#335 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-14 14:06:36 +08:00
ChenDRAG	dd4a01132c	Fix SAC loss explode (#333 ) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased v0.4.1	2021-04-04 17:33:35 +08:00
n+e	825da9bc53	add cross-platform test and release 0.4.1 (#331 ) * bump to 0.4.1 * add cross-platform test	2021-03-31 15:14:22 +08:00
n+e	09692c84fe	fix numpy>=1.20 typing check (#323 ) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).	2021-03-30 16:06:03 +08:00
ChenDRAG	6426a39796	ppo benchmark (#330 )	2021-03-30 11:50:35 +08:00
ChenDRAG	5d580c3662	refactor ppo (#329 )	2021-03-28 18:28:36 +08:00
ChenDRAG	1730a9008a	A2C benchmark for mujoco (#325 )	2021-03-28 13:12:43 +08:00
ChenDRAG	105b277b87	hotfix:keep statisics of buffer when reset buffer in on policy trainer (#328 )	2021-03-27 16:58:48 +08:00
n+e	8963a14327	fix exception in tutorials/dqn.rst (#327 )	2021-03-26 12:57:00 +08:00
Yuge Zhang	7db21f3df6	Test on finite vector env (#324 ) add test/base/test_env_finite.py	2021-03-25 22:59:34 +08:00
ChenDRAG	3ac67d9974	refactor A2C/PPO, change behavior of value normalization (#321 )	2021-03-25 10:12:39 +08:00
ChenDRAG	47c77899d5	Add REINFORCE benchmark for mujoco (#320 )	2021-03-24 19:59:53 +08:00
ChenDRAG	e27b5a26f3	Refactor PG algorithm and change behavior of `compute_episodic_return` (#319 ) - simplify code - apply value normalization (global) and adv norm (per-batch) in on-policy algorithms	2021-03-23 22:05:48 +08:00
ChenDRAG	2c11b6e43b	Add lr_scheduler option for Onpolicy algorithm (#318 ) add lr_scheduler option in PGPolicy/A2CPolicy/PPOPolicy	2021-03-22 16:57:24 +08:00
ChenDRAG	4d92952a7b	Remap action to fit gym's action space (#313 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-03-21 16:45:50 +08:00
n+e	0c7117dd55	fix concepts.rst with regard to new buffer behavior (#316 ) fix #315	2021-03-20 21:46:36 +08:00
n+e	ec23c7efe9	fix qvalue mask_action error for obs_next (#310 ) * fix #309 * remove for-loop in dqn expl_noise	2021-03-15 08:06:24 +08:00
ChenDRAG	243ab43b3c	support observation normalization in BaseVectorEnv (#308 ) add RunningMeanStd	2021-03-11 20:50:20 +08:00
ChenDRAG	5c53f8c1f8	fix reward_metric & n_episode bug in on policy algorithm (#306 )	2021-03-08 14:35:30 +08:00
ChenDRAG	e605bdea94	MuJoCo Benchmark - DDPG, TD3, SAC (#305 ) Releasing Tianshou's SOTA benchmark of 9 out of 13 environments from the MuJoCo Gym task suite.	2021-03-07 19:21:02 +08:00
n+e	389bdb7ed3	Merge pull request #302 from thu-ml/dev v0.4.0 v0.4.0	2021-03-02 20:28:29 +08:00
n+e	454c86c469	fix venv seed, add TOC in docs, and split buffer.py into several files (#303 ) Things changed in this PR: - various docs update, add TOC - split buffer into several files - fix venv action_space randomness	2021-03-02 12:28:28 +08:00
n+e	31e7f445d1	fix vecenv action_space randomness (#300 )	2021-03-01 15:44:03 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	3108b9db0d	Add Timelimit trick to optimize policies (#296 ) * consider timelimit.truncated in calculating returns by default * remove ignore_done	2021-02-26 13:23:18 +08:00
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
Trinkle23897	e99e1b0fdd	Improve buffer.prev() & buffer.next() (#294 )	2021-02-22 19:19:22 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
Trinkle23897	d918022ce9	merge master into dev	2021-02-18 12:46:55 +08:00
n+e	cb65b56b13	v0.3.2 (#292 ) Throw a warning in ListReplayBuffer. This version update is needed because of #289, the previous v0.3.1 cannot work well under torch<=1.6.0 with cuda environment. v0.3.2	2021-02-16 09:31:46 +08:00
n+e	d003c8e566	fix 2 bugs of batch (#284 ) 1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)` before: ```python TypeError: cannot concatenate with Batch() which is scalar ``` after: ```python Batch( a: Batch(), b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), ) ``` 2. creating keys in a batch's subkey, e.g. ```python a = Batch(info={"key1": [0, 1], "key2": [2, 3]}) a[0] = Batch(info={"key1": 2, "key3": 4}) print(a) ``` before: ```python Batch( info: Batch( key1: array([0, 1]), key2: array([0, 3]), ), ) ``` after: ```python ValueError: Creating keys is not supported by item assignment. ``` 3. small optimization for `Batch.stack_` and `Batch.cat_`	2021-02-16 09:01:54 +08:00
ChenDRAG	f528131da1	hotfix：fix test failure in cuda environment (#289 )	2021-02-09 17:13:40 +08:00
Trinkle23897	e3ee415b1a	temporary fix numpy<1.20.0 (#281 )	2021-02-08 12:59:37 +08:00
n+e	c838f2f0e9	fix 2 bugs of batch (#284 ) 1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)` before: ```python TypeError: cannot concatenate with Batch() which is scalar ``` after: ```python Batch( a: Batch(), b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), ) ``` 2. creating keys in a batch's subkey, e.g. ```python a = Batch(info={"key1": [0, 1], "key2": [2, 3]}) a[0] = Batch(info={"key1": 2, "key3": 4}) print(a) ``` before: ```python Batch( info: Batch( key1: array([0, 1]), key2: array([0, 3]), ), ) ``` after: ```python ValueError: Creating keys is not supported by item assignment. ``` 3. small optimization for `Batch.stack_` and `Batch.cat_`, raise ValueError when receiving invalid data format.	2021-02-02 19:28:05 +08:00
ChenDRAG	f0129f4ca7	Add CachedReplayBuffer and ReplayBufferManager (#278 ) This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail. 1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer; 2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`; 3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle; 4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data; 5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False); 6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in; 7. Move buffer's documentation to `docs/tutorials/concepts.rst`. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-29 12:23:18 +08:00

... 5 6 7 8 9 ...

563 Commits