Tianshou

Author	SHA1	Message	Date
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Jiayi Weng	109875d43d	Fix num_envs=test_num (#653 ) * fix num_envs=test_num * fix mypy	2022-05-30 12:38:47 +08:00
Michal Gregor	c87b9f49bc	Add show_progress option for trainer (#641 ) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.	2022-05-17 23:41:59 +08:00
Jiayi Weng	2a7c151738	Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628 ) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README	2022-05-05 19:55:15 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
ChenDRAG	5c9afe72f3	Update Mujoco Bemchmark's webpage (#606 )	2022-04-24 01:11:33 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.	2022-03-22 04:29:27 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	844d7703c3	NPG Mujoco benchmark release (#347 )	2021-04-21 16:31:20 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
ChenDRAG	333b8fbd66	add plotter (#335 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-14 14:06:36 +08:00
ChenDRAG	dd4a01132c	Fix SAC loss explode (#333 ) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased	2021-04-04 17:33:35 +08:00
ChenDRAG	6426a39796	ppo benchmark (#330 )	2021-03-30 11:50:35 +08:00
ChenDRAG	1730a9008a	A2C benchmark for mujoco (#325 )	2021-03-28 13:12:43 +08:00
ChenDRAG	3ac67d9974	refactor A2C/PPO, change behavior of value normalization (#321 )	2021-03-25 10:12:39 +08:00
ChenDRAG	47c77899d5	Add REINFORCE benchmark for mujoco (#320 )	2021-03-24 19:59:53 +08:00
ChenDRAG	4d92952a7b	Remap action to fit gym's action space (#313 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-03-21 16:45:50 +08:00
ChenDRAG	e605bdea94	MuJoCo Benchmark - DDPG, TD3, SAC (#305 ) Releasing Tianshou's SOTA benchmark of 9 out of 13 environments from the MuJoCo Gym task suite.	2021-03-07 19:21:02 +08:00
n+e	31e7f445d1	fix vecenv action_space randomness (#300 )	2021-03-01 15:44:03 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	3108b9db0d	Add Timelimit trick to optimize policies (#296 ) * consider timelimit.truncated in calculating returns by default * remove ignore_done	2021-02-26 13:23:18 +08:00
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
ChenDRAG	a633a6a028	update utils.network (#275 ) This is the first commit of 6 commits mentioned in #274, which features 1. Refactor of `Class Net` to support any form of MLP. 2. Enable type check in utils.network. 3. Relative change in docs/test/examples. 4. Move atari-related network to examples/atari/atari_network.py Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-01-20 16:54:13 +08:00
Trinkle23897	cd481423dc	sac mujoco result (#246 )	2020-11-09 16:43:55 +08:00
n+e	710966eda7	change API of train_fn and test_fn (#229 ) train_fn(epoch) -> train_fn(epoch, num_env_step) test_fn(epoch) -> test_fn(epoch, num_env_step)	2020-09-26 16:35:37 +08:00
n+e	c91def6cbc	code format and update function signatures (#213 ) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))	2020-09-12 15:39:01 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	a9f9940d17	code refactor for venv (#179 ) - Refacor code to remove duplicate code - Enable async simulation for all vector envs - Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv` The abstraction of vector env changed. Prior to this pr, each vector env is almost independent. After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility. Co-authored-by: n+e <463003665@qq.com> Co-authored-by: magicly <magicly007@gmail.com>	2020-08-19 15:00:24 +08:00
Minghao Zhang	0b08a41610	move mujoco to examples (#12 ) * move mujoco to examples * fix the import mujoco bug * flake8 * flake8 * rm __init__.py	2020-04-02 08:49:19 +08:00

35 Commits