Tianshou

Author	SHA1	Message	Date
Jiayi Weng	7f8fa241dd	making pettingzoo a core dep instead of optional req (#837 ) close #831	2023-03-25 22:01:09 -07:00
sunkafei	bc222e87a6	Fix #811 (#817 )	2023-03-03 16:57:04 -08:00
Markus Krimmel	6c6c872523	Gymnasium Integration (#789 ) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs?	2023-02-03 11:57:27 -08:00
Markus Krimmel	4c3791a459	Updated atari wrappers, fixed pre-commit (#781 ) This PR addresses #772 (updates Atari wrappers to work with new Gym API) and some additional issues: - Pre-commit was using gitlab for flake8, which as of recently requires authentication -> Replaced with GitHub - Yapf was quietly failing in pre-commit. Changed it such that it fixes formatting in-place - There is an incompatibility between flake8 and yapf where yapf puts binary operators after the line break and flake8 wants it before the break. I added an exception for flake8. - Also require `packaging` in setup.py My changes shouldn't change the behaviour of the wrappers for older versions, but please double check. Idk whether it's just me, but there are always some incompatibilities between yapf and flake8 that need to resolved manually. It might make sense to try black instead.	2022-12-04 13:00:53 -08:00
Yuge Zhang	65c4e3d4cd	Fix NNI tests upon v2.9 upgrade (#750 ) * Fix NNI tests upon v2.9 upgrade * Un-ignore * fix	2022-09-26 13:55:26 -07:00
Markus Krimmel	ea36dc5195	Changes to support Gym 0.26.0 (#748 ) * Changes to support Gym 0.26.0 * Replace map by simpler list comprehension * Use syntax that is compatible with python 3.7 * Format code * Fix environment seeding in test environment, fix buffer_profile test * Remove self.seed() from __init__ * Fix random number generation * Fix throughput tests * Fix tests * Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings * fix lint * fix * fix import * fix * fix mypy * pytest --ignore='test/3rd_party' * Use correct step API in _SetAttrWrapper * Format * Fix mypy * Format * Fix pydocstyle.	2022-09-26 09:31:23 -07:00
Yifei Cheng	43792bf5ab	Upgrade gym (#613 ) fixes some deprecation warnings due to new changes in gym version 0.23: - use `env.np_random.integers` instead of `env.np_random.randint` - support `seed` and `return_info` arguments for reset (addresses https://github.com/thu-ml/tianshou/issues/605)	2022-06-28 06:52:21 +08:00
Jiayi Weng	6ad5b520fa	Fix sphinx build error (#655 )	2022-06-01 13:56:04 +08:00
Jiayi Weng	2a7c151738	Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628 ) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README	2022-05-05 19:55:15 +08:00
Yifei Cheng	6fc6857812	Update Multi-agent RL docs, upgrade pettingzoo (#595 ) * update multi-agent docs, upgrade pettingzoo * avoid pettingzoo deprecation warning * fix pistonball tests * codestyle	2022-04-16 23:17:53 +08:00
Jiayi Weng	6ab9860183	fix negative collector time (#578 )	2022-03-26 10:44:08 +08:00
Jiayi Weng	c248b4f87e	fix conda support and keep API compatibility (#536 ) * loose constrains * fix nni issue (#478) * fix coverage	2022-02-26 00:05:02 +08:00
Chengqi Duan	23fbc3b712	upgrade gym version to >=0.21, fix related CI and update examples/atari (#534 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-25 07:40:33 +08:00
Mohammad Mahdi Rahimi	c7e2e56fac	Pettingzoo support (#494 ) Co-authored-by: Rodrigo de Lazcano <r.l.p.v96@gmail.com> Co-authored-by: J K Terry <justinkterry@gmail.com>	2022-02-15 22:56:45 +08:00
Jiayi Weng	3d697aa4c6	make unit test faster (#522 ) * test cache expert data in offline training * faster cql test * faster tests * use dummy * test ray dependency	2022-02-09 00:24:52 +08:00
Jiayi Weng	926ec0b9b1	update save_fn in trainer (#459 ) - collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) - save_fn() will be called at the beginning of trainer	2021-10-13 21:25:24 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
Andriy Drozdyuk	18d2f25eff	Remove warnings about the use of save_fn across trainers (#408 )	2021-08-04 09:56:00 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 )	2021-06-26 18:08:41 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
n+e	ff4d3cd714	Support different state size and fix exception in venv.__del__ (#352 ) - Batch: do not raise error when it finds list of np.array with different shape[0]. - Venv's obs: add try...except block for np.stack(obs_list) - remove venv.__del__ since it is buggy	2021-04-25 15:23:46 +08:00
n+e	09692c84fe	fix numpy>=1.20 typing check (#323 ) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).	2021-03-30 16:06:03 +08:00
Trinkle23897	e3ee415b1a	temporary fix numpy<1.20.0 (#281 )	2021-02-08 12:59:37 +08:00
Jialu Zhu	a511cb4779	Add offline trainer and discrete BCQ algorithm (#263 ) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-20 18:13:04 +08:00
Nico Gürtler	5d13d8a453	Saving and loading replay buffer with HDF5 (#261 ) As mentioned in #260, this pull request is about an implementation of saving and loading the replay buffer with HDF5.	2020-12-17 08:58:43 +08:00
n+e	5ed6c1c7aa	change the step in trainer (#235 ) This PR separates the `global_step` into `env_step` and `gradient_step`. In the future, the data from the collecting state will be stored under `env_step`, and the data from the updating state will be stored under `gradient_step`. Others: - add `rew_std` and `best_result` into the monitor - fix network unbounded in `test/continuous/test_sac_with_il.py` and `examples/box2d/bipedal_hardcore_sac.py` - change the dependency of ray to 1.0.0 since ray-project/ray#10134 has been resolved	2020-10-04 21:55:43 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	64af7ea839	fix critical bugs in MAPolicy and docs update (#207 ) - fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy) - polish examples/box2d/bipedal_hardcore_sac.py - several docs update - format setup.py and bump version to 0.2.7	2020-09-08 21:10:48 +08:00
Trinkle23897	34f714a677	Numba acceleration (#193 ) Training FPS improvement (base commit is 94bfb32): test_pdqn: 1660 (without numba) -> 1930 discrete/test_ppo: 5100 -> 5170 since nstep has little impact on overall performance, the unit test result is: GAE: 4.1s -> 0.057s nstep: 0.3s -> 0.15s (little improvement) Others: - fix a bug in ttt set_eps - keep only sumtree in segment tree implementation - dirty fix for asyncVenv check_id test	2020-09-02 13:03:32 +08:00
n+e	94bfb32cc1	optimize training procedure and improve code coverage (#189 ) 1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).	2020-08-27 12:15:18 +08:00
youkaichao	a9f9940d17	code refactor for venv (#179 ) - Refacor code to remove duplicate code - Enable async simulation for all vector envs - Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv` The abstraction of vector env changed. Prior to this pr, each vector env is almost independent. After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility. Co-authored-by: n+e <463003665@qq.com> Co-authored-by: magicly <magicly007@gmail.com>	2020-08-19 15:00:24 +08:00
n+e	352a518399	3 fix (#158 ) - fix 2 warning in doctest - change the minimum version of gym (to be aligned with openai baselines) - change squeeze and reshape to flatten (related to #155). I think flatten is better.	2020-07-23 15:12:02 +08:00
Trinkle23897	69e4b3d301	fix setup err on building docs	2020-04-28 21:11:40 +08:00
Oblivion	9380368ca3	add an example of bullet env (experiment from jiqizhixin) (#15 ) * add_pybullet_ens_test test on pybullet envs modify some log config * delete DS_Store file * add pybullet_envs test add HalfCheetahBulletEnv-v0 test modify log config * fix pep 8 errors * add pybullet to dev * delete a line * by pass F401 * add log_interval to onpolicy_trainer * add comments * Update halfcheetahBullet_v0_sac.py	2020-04-04 11:46:18 +08:00
ShenDezhou	4da857d86e	Fix windows env setup bugs and other typo. (#11 )	2020-03-31 17:22:32 +08:00
Trinkle23897	57735ce1b5	add logo and sphinx setup	2020-03-28 22:01:23 +08:00
Trinkle23897	c42990c725	add rllib result and fix pep8	2020-03-28 09:43:35 +08:00
Minghao Zhang	77068af526	add examples, fix some bugs (#5 ) * update atari.py * fix setup.py pass the pytest * fix setup.py pass the pytest * add args "render" * change the tensorboard writter * change the tensorboard writter * change device, render, tensorboard log location * change device, render, tensorboard log location * remove some wrong local files * fix some tab mistakes and the envs name in continuous/test_xx.py * add examples and point robot maze environment * fix some bugs during testing examples * add dqn network and fix some args * change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally * add a warning to collector * rm some unrelated files * reformat * fix a bug in test_dqn due to the model wrong selection	2020-03-28 07:27:18 +08:00
Trinkle23897	044aae4355	add baseline and rlpyt result	2020-03-27 16:24:07 +08:00
Minghao Zhang	3c0a09fefd	minor reformat (#2 ) * update atari.py * fix setup.py pass the pytest * fix setup.py pass the pytest	2020-03-26 09:01:20 +08:00
Trinkle23897	64bab0b6a0	ddpg	2020-03-18 21:45:41 +08:00
Trinkle23897	39de63592f	finish pg	2020-03-17 11:37:31 +08:00
Trinkle23897	8b0b970c9b	add speed stat	2020-03-16 15:04:58 +08:00
Trinkle23897	c804662457	add cache buf in collector	2020-03-14 21:48:31 +08:00
Trinkle23897	f16e05c0e7	maybe finished collector?	2020-03-13 17:49:22 +08:00
Trinkle23897	f58c1397c6	half of collector	2020-03-12 22:20:33 +08:00
Trinkle23897	6632e47b9d	add test_buffer	2020-03-11 17:28:51 +08:00

1 2

53 Commits