Tianshou

Author	SHA1	Message	Date
Markus Krimmel	ea36dc5195	Changes to support Gym 0.26.0 (#748 ) * Changes to support Gym 0.26.0 * Replace map by simpler list comprehension * Use syntax that is compatible with python 3.7 * Format code * Fix environment seeding in test environment, fix buffer_profile test * Remove self.seed() from __init__ * Fix random number generation * Fix throughput tests * Fix tests * Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings * fix lint * fix * fix import * fix * fix mypy * pytest --ignore='test/3rd_party' * Use correct step API in _SetAttrWrapper * Format * Fix mypy * Format * Fix pydocstyle.	2022-09-26 09:31:23 -07:00
Yi Su	9ce0a554dc	Add Atari SAC examples (#657 ) - Add Atari (discrete) SAC examples; - Fix a bug in Discrete SAC evaluation; default to deterministic mode.	2022-06-04 13:26:08 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.	2022-03-22 04:29:27 +08:00
Anas BELFADIL	d976a5aa91	Fixed hardcoded reward_treshold (#548 )	2022-03-04 10:35:39 +08:00
Jiayi Weng	3d697aa4c6	make unit test faster (#522 ) * test cache expert data in offline training * faster cql test * faster tests * use dummy * test ray dependency	2022-02-09 00:24:52 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	f22b539761	Remove reward_normaliztion option in offpolicy algorithm (#298 ) * remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-27 11:20:43 +08:00
ChenDRAG	3108b9db0d	Add Timelimit trick to optimize policies (#296 ) * consider timelimit.truncated in calculating returns by default * remove ignore_done	2021-02-26 13:23:18 +08:00
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
ChenDRAG	f528131da1	hotfix：fix test failure in cuda environment (#289 )	2021-02-09 17:13:40 +08:00
ChenDRAG	a633a6a028	update utils.network (#275 ) This is the first commit of 6 commits mentioned in #274, which features 1. Refactor of `Class Net` to support any form of MLP. 2. Enable type check in utils.network. 3. Relative change in docs/test/examples. 4. Move atari-related network to examples/atari/atari_network.py Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-01-20 16:54:13 +08:00
Trinkle23897	cd481423dc	sac mujoco result (#246 )	2020-11-09 16:43:55 +08:00
n+e	710966eda7	change API of train_fn and test_fn (#229 ) train_fn(epoch) -> train_fn(epoch, num_env_step) test_fn(epoch) -> test_fn(epoch, num_env_step)	2020-09-26 16:35:37 +08:00
danagi	a6ee979609	implement sac for discrete action settings (#216 ) Co-authored-by: n+e <trinkle23897@cmu.edu>	2020-09-14 14:59:23 +08:00

19 Commits