Tianshou

Author	SHA1	Message	Date
Michael Panchenko	6746a80f6d	Add publish workflow, first preparation for next release (#1067 )	2024-03-04 12:21:49 +01:00
Dominik Jain	b6b2c95ac7	Improve README, minor changes in procedural example	2024-03-03 15:06:40 +01:00
Ashok Arora	0b61bf8caf	Fix the link to the contributing guide. (#1062 )	2024-02-23 23:15:41 +01:00
maxhuettenrauch	5fe9aea798	Update and fix dependencies related to mac install (#1044 ) Addresses part of #1015 ### Dependencies - move jsonargparse and docstring-parser to dependencies to run hl examples without dev - create mujoco-py extra for legacy mujoco envs - updated atari extra - removed atari-py and gym dependencies - added ALE-py, autorom, and shimmy - created robotics extra for HER-DDPG ### Mac specific - only install envpool when not on mac - mujoco-py not working on macOS newer than Monterey (https://github.com/openai/mujoco-py/issues/777) - D4RL also fails due to dependency on mujoco-py (https://github.com/Farama-Foundation/D4RL/issues/232) ### Other - reduced training-num/test-num in example files to a number ≤ 20 (examples with 100 led to too many open files) - rendering for Mujoco envs needs to be fixed on gymnasium side (https://github.com/Farama-Foundation/Gymnasium/issues/749) --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com>	2024-02-06 17:06:38 +01:00
Michael Panchenko	4756ee80ff	Fixed links and added poetry install info in README [skip ci]	2024-01-24 18:07:02 +01:00
Michael Panchenko	a0e0824c64	Typo in README.md [skip ci]	2024-01-24 17:50:15 +01:00
Dominik Jain	05a8cf4e74	Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered	2024-01-16 14:52:31 +01:00
Dominik Jain	a4d7ccba26	Remove PyTorch warning from README	2024-01-16 13:43:14 +01:00
Dominik Jain	be9eb7e241	Improve language in README	2024-01-16 13:43:14 +01:00
Dominik Jain	3c564e82b7	Remove video from procedural example as it pertains to a different algorithm	2024-01-16 13:43:14 +01:00
Dominik Jain	2c72171fca	Update procedural example in README	2024-01-16 13:43:14 +01:00
Dominik Jain	39f3ba2266	Add screen recording of high-level example	2024-01-16 13:43:14 +01:00
Dominik Jain	961e9a7801	Add high-level example to README	2024-01-16 13:43:14 +01:00
Michael Yang	294145aa3d	Fix an example code in readme (#1011 ) Simple fix of an error	2023-12-14 22:46:56 -08:00
Matthew Turnshek	31fa0325fa	Update quickstart argument name (#994 ) Noticed an improper argument name when going through the quickstart.	2023-11-22 21:05:37 -08:00
Michael Panchenko	66b7fc542b	Minor dep update (#961 ) Support gymnasium >=0.28, small extension of readme	2023-10-09 22:10:09 +02:00
Michael Panchenko	b900fdf6f2	Remove kwargs in policy init (#950 ) Closes #947 This removes all kwargs from all policy constructors. While doing that, I also improved several names and added a whole lot of TODOs. ## Functional changes: 1. Added possibility to pass None as `critic2` and `critic2_optim`. In fact, the default behavior then should cover the absolute majority of cases 2. Added a function called `clone_optimizer` as a temporary measure to support passing `critic2_optim=None` ## Breaking changes: 1. `action_space` is no longer optional. In fact, it already was non-optional, as there was a ValueError in BasePolicy.init. So now several examples were fixed to reflect that 2. `reward_normalization` removed from DDPG and children. It was never allowed to pass it as `True` there, an error would have been raised in `compute_n_step_reward`. Now I removed it from the interface 3. renamed `critic1` and similar to `critic`, in order to have uniform interfaces. Note that the `critic` in DDPG was optional for the sole reason that child classes used `critic1`. I removed this optionality (DDPG can't do anything with `critic=None`) 4. Several renamings of fields (mostly private to public, so backwards compatible) ## Additional changes: 1. Removed type and default declaration from docstring. This kind of duplication is really not necessary 2. Policy constructors are now only called using named arguments, not a fragile mixture of positional and named as before 5. Minor beautifications in typing and code 6. Generally shortened docstrings and made them uniform across all policies (hopefully) ## Comment: With these changes, several problems in tianshou's inheritance hierarchy become more apparent. I tried highlighting them for future work. --------- Co-authored-by: Dominik Jain <d.jain@appliedai.de>	2023-10-08 08:57:03 -07:00
Jiayi Weng	6449a43261	Fix documentation build (#951 ) Close #941 rtfd build link: https://readthedocs.org/projects/tianshou/builds/22019877/ Also -- fix two small issues reported by users, see #928 and #930 Note: I created the branch in thu-ml:tianshou instead of Trinkle23897:tianshou to quickly check the rtfd build. It's not a good process since every commit would trigger twice CI pipelines :(	2023-09-26 08:24:08 -07:00
Jiayi Weng	61182450b6	add py.typed, drop 3.6/3.7, support 3.11 (#910 ) closing #892 #901	2023-08-10 14:13:46 -07:00
Jiayi Weng	d5d521b329	fix conda installation command (#830 ) close #828	2023-03-19 17:40:47 -07:00
Markus Krimmel	6c6c872523	Gymnasium Integration (#789 ) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs?	2023-02-03 11:57:27 -08:00
fzyzcjy	7ff12b909d	Tiny change since the tests are more than unit tests (#765 ) IMHO, unit tests, compared with integration tests or end-to-end tests or other tests, often means something that only tests a single method/function/class/etc, and often has a lot of stubs and mocks so it is far from a typical/real usage scenario. On the other hand, integration tests or e2e tests mock less and are more like the real case. Tianshou says: > ... tests include the full agent training procedure for all of the implemented algorithms It seems that this is more than unit test, and falls into the category of integration or even e2e tests.	2022-11-01 07:20:20 -07:00
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Markus Krimmel	b0c8d28a7d	Added pre-commit (#752 ) - This PR adds the checks that are defined in the Makefile as pre-commit hooks. - Hopefully, the checks are equivalent to those from the Makefile, but I can't guarantee it. - CI remains as it is. - As I pointed out on discord, I experienced some conflicts between flake8 and yapf, so it might be better to transition to some other combination (e.g. black).	2022-10-02 08:57:45 -07:00
Jiayi Weng	278c91a222	Update citation and contributor (#721 ) * update citation * update contributor * pass lint	2022-08-10 20:06:51 -07:00
Jiayi Weng	65054847ef	bump version to 0.4.9 (#684 )	2022-07-05 01:07:16 +08:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00
Jiayi Weng	2a7c151738	Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628 ) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README	2022-05-05 19:55:15 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
Jiayi Weng	18277497ed	fix py39 ci venv test failure (#593 )	2022-04-12 22:29:39 +08:00
Yi Su	2377f2f186	Implement Generative Adversarial Imitation Learning (GAIL) (#550 ) Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)	2022-03-06 23:57:15 +08:00
Chengqi Duan	d85bc19269	update dqn tutorial and add envpool to docs (#526 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-15 06:39:47 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
deeplook	728b88b92d	Fix conda install command (#419 )	2021-08-16 18:56:01 +08:00
n+e	5b7732a29b	make ppo discrete test script more general (#418 )	2021-08-15 21:37:37 +08:00
n+e	bba30f83d1	fix sb2's coverage (#412 )	2021-08-10 17:43:27 +08:00
Miguel Morales	42538f8e58	Update README.md (#410 )	2021-08-10 09:14:20 +08:00
ChenDRAG	0674ff628a	Cite Tianshou's latest paper (#406 ) * Cite Tianshou's latest paper * update new version README * change order Co-authored-by: Jiayi Weng <wengj@sea.com>	2021-08-10 08:35:01 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 )	2021-06-26 18:08:41 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00

1 2 3

123 Commits