Tianshou

Author	SHA1	Message	Date
Jiayi Weng	61182450b6	add py.typed, drop 3.6/3.7, support 3.11 (#910 ) closing #892 #901	2023-08-10 14:13:46 -07:00
Błażej Osiński	864ee3df2f	Make monitor_gym configurable in WandbLogger. (#896 ) At the moment, WandbLogger is always using wandb.init with monitor_gym = True. This fails when OpenAI's gym is not installed, which doesn't make sense after the transition to Gymnasium. I am using Tianshou with non-standard RL environment, which adhere to Gymnasium API, and the current code is throwing exceptions. I suggest to make it a controllable parameter. I left the default value to True (to make it functionally the same for people using gym). It may also make sense to change the default to False.	2023-08-09 15:13:25 -07:00
Błażej Osiński	cd218dc12d	Add assert description. (#894 ) The assert was missing a description, I fixed it. Please note: there is an error in the documentations, but it does not seem to be related to my changes.	2023-08-09 15:12:42 -07:00
Anas BELFADIL	cb8551f315	Fix master branch test issues (#908 )	2023-08-09 10:27:18 -07:00
Zhenjie Zhao	f8808d236f	fix a problem of the atari dqn example (#861 )	2023-04-30 08:44:27 -07:00
Gen	7ce62a6ad4	actor critic share head bug for example code without sharing head - unify code style (#860 )	2023-04-28 21:43:22 -07:00
ChenDRAG	1423eeb3b2	Add warnings for duplicate usage of action-bounded actor and action scaling method (#850 ) - Fix the current bug discussed in #844 in `test_ppo.py`. - Add warning for `ActorProb ` if both `max_action ` and `unbounded=True` are used for model initializations. - Add warning for PGpolicy and DDPGpolicy if they find duplicate usage of action-bounded actor and action scaling method.	2023-04-23 16:03:31 -07:00
wckwan	e7c2c3711e	Update gail.py (#849 ) Remove repeated description of lr_scheduler in the doc string.	2023-04-13 07:25:57 -07:00
Quoding	4ac407c78f	Remove test_fn and train_fn as they are not used in PPO PistonBall example for PettingZoo (#840 ) Specifically, BasePolicy.set_eps seems to be a remnant from using DQN in other examples. * Removed unused functions (test_fn and train_fn) from the pettingzoo example with PistonBall. These functions use set_eps which is not available for PPO and is not even called once in the file.	2023-03-31 10:43:21 -07:00
Jiayi Weng	7f8fa241dd	making pettingzoo a core dep instead of optional req (#837 ) close #831	2023-03-25 22:01:09 -07:00
Jiayi Weng	d5d521b329	fix conda installation command (#830 ) close #828	2023-03-19 17:40:47 -07:00
Jiayi Weng	efdf72cb31	fix sphinx itemlist render error	2023-03-12 22:27:39 -07:00
Jiayi Weng	f0afdeaf6a	update version to 0.5.0 (#826 ) v0.5.0	2023-03-12 22:07:16 -07:00
Oren Zeev-Ben-Mordehai	73600edc58	fix a bug in batch._is_batch_set (#825 ) - [ ] I have marked all applicable categories: + [x] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [ ] I have reformatted the code using `make format` (required) - [ ] I have checked the code using `make commit-checks` (required) - [ ] If applicable, I have mentioned the relevant/related issue(s) - [ ] If applicable, I have listed every items in this Pull Request below I'm developing a new PettingZoo environment. It is a two players turns board game. ``` obs_space = dict( board = gym.spaces.MultiBinary([8, 8]), player = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2), other_player = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2) ) self._observation_space = gym.spaces.Dict(spaces=obs_space) self._action_space = gym.spaces.Tuple([gym.spaces.Discrete(8)] * 2) ... # this cache ensures that same space object is returned for the same agent # allows action space seeding to work as expected @functools.lru_cache(maxsize=None) def observation_space(self, agent): # gymnasium spaces are defined and documented here: https://gymnasium.farama.org/api/spaces/ return self._observation_space @functools.lru_cache(maxsize=None) def action_space(self, agent): return self._action_space ``` My test is: ``` def test_with_tianshou(): action = None # env = gym.make('qwertyenv/CollectCoins-v0', pieces=['rock', 'rock']) env = CollectCoinsEnv(pieces=['rock', 'rock'], with_mask=True) def another_action_taken(action_taken): nonlocal action action = action_taken # Wrapping the original environment as to make sure a valid action will be taken. env = EnsureValidAction( env, env.check_action_valid, env.provide_alternative_valid_action, another_action_taken ) env = PettingZooEnv(env) policies = MultiAgentPolicyManager([RandomPolicy(), RandomPolicy()], env) env = DummyVectorEnv([lambda: env]) collector = Collector(policies, env) result = collector.collect(n_step=200, render=0.1) ``` I have also a wrapper that may be redundant as of Tianshou capability to action_mask, yet it is still part of the code: ``` from typing import TypeVar, Callable import gymnasium as gym from pettingzoo.utils.wrappers import BaseWrapper Action = TypeVar("Action") class ActionWrapper(BaseWrapper): def __init__(self, env: gym.Env): super().__init__(env) def step(self, action): action = self.action(action) self.env.step(action) def action(self, action): pass def render(self, args, kwargs): self.env.render(args, **kwargs) class EnsureValidAction(ActionWrapper): """ A gym environment wrapper to help with the case that the agent wants to take invalid actions. For example consider a Chess game, where you let the action_space be any piece moving to any square on the board, but then when a wrong move is taken, instead of returing a big negative reward, you just take another action, this time a valid one. To make sure the learning algorithm is aware of the action taken, a callback should be provided. """ def __init__(self, env: gym.Env, check_action_valid: Callable[[Action], bool], provide_alternative_valid_action: Callable[[Action], Action], alternative_action_cb: Callable[[Action], None]): super().__init__(env) self.check_action_valid = check_action_valid self.provide_alternative_valid_action = provide_alternative_valid_action self.alternative_action_cb = alternative_action_cb def action(self, action: Action) -> Action: if self.check_action_valid(action): return action alternative_action = self.provide_alternative_valid_action(action) self.alternative_action_cb(alternative_action) return alternative_action ``` To make above work I had to patch a bit PettingZoo (opened a pull-request there), and a small patch here (this PR). Maybe I'm doing something wrong, yet I fail to see it. With my both fixes of PZ and of Tianshou, I have two tests, one of the environment by itself, and the other as of above.	2023-03-12 17:58:09 -07:00
sunkafei	bc222e87a6	Fix #811 (#817 )	2023-03-03 16:57:04 -08:00
Jiayi Weng	c8be85b240	fix readthedocs build error	2023-02-03 14:55:53 -08:00
Jiayi Weng	e8acf0dd46	Fix readthedocs build failure (#803 )	2023-02-03 14:40:05 -08:00
Markus Krimmel	6c6c872523	Gymnasium Integration (#789 ) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs?	2023-02-03 11:57:27 -08:00
Jose Antonio Martin H	6019406cff	Add "act" to preprocess_fn call in collector. (#801 ) This allows, for instance, to change the action registered into the buffer when the environment modify the action. Useful in offline learning for instance, since the true actions are in a dataset and the actions of the agent are ignored. - [ ] I have marked all applicable categories: + [ ] exception-raising fix + [ ] algorithm implementation fix + [ ] documentation modification + [X] new feature - [X ] I have reformatted the code using `make format` (required) - [X] I have checked the code using `make commit-checks` (required) - [] If applicable, I have mentioned the relevant/related issue(s) - [X] If applicable, I have listed every items in this Pull Request below	2023-02-03 11:19:38 -08:00
janofsssun	774d3d8e83	Implement args/kwargs for init of norm_layers and activation (#788 ) As mentioned in #770 , I have fixed the mismatch of args between the Net and MLP. Also, in order to initialize the norm_layers and activations, norm_args and act_args are added to the miniblock and related classes.	2022-12-26 19:58:03 -08:00
Jiayi Weng	1037627a5b	fix info not pass issue in PGPolicy (#787 ) close #775 v0.4.11	2022-12-24 13:06:54 -08:00
Markus Krimmel	4c3791a459	Updated atari wrappers, fixed pre-commit (#781 ) This PR addresses #772 (updates Atari wrappers to work with new Gym API) and some additional issues: - Pre-commit was using gitlab for flake8, which as of recently requires authentication -> Replaced with GitHub - Yapf was quietly failing in pre-commit. Changed it such that it fixes formatting in-place - There is an incompatibility between flake8 and yapf where yapf puts binary operators after the line break and flake8 wants it before the break. I added an exception for flake8. - Also require `packaging` in setup.py My changes shouldn't change the behaviour of the wrappers for older versions, but please double check. Idk whether it's just me, but there are always some incompatibilities between yapf and flake8 that need to resolved manually. It might make sense to try black instead.	2022-12-04 13:00:53 -08:00
Yi Su	662af52820	Fix Atari PPO example (#780 ) - [x] I have marked all applicable categories: + [ ] exception-raising fix + [x] algorithm implementation fix + [ ] documentation modification + [ ] new feature - [x] I have reformatted the code using `make format` (required) - [x] I have checked the code using `make commit-checks` (required) - [x] If applicable, I have mentioned the relevant/related issue(s) - [x] If applicable, I have listed every items in this Pull Request below While trying to debug Atari PPO+LSTM, I found significant gap between our Atari PPO example vs [CleanRL's Atari PPO w/ EnvPool](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy). I tried to align our implementation with CleaRL's version, mostly in hyper parameter choices, and got significant gain in Breakout, Qbert, SpaceInvaders while on par in other games. After this fix, I would suggest updating our [Atari Benchmark](https://tianshou.readthedocs.io/en/master/tutorials/benchmark.html) PPO experiments. A few interesting findings: - Layer initialization helps stabilize the training and enable the use of larger learning rates; without it, larger learning rates will trigger NaN gradient very quickly; - ppo.py#L97-L101: this change helps training stability for reasons I do not understand; also it makes the GPU usage higher. Shoutout to [CleanRL](https://github.com/vwxyzjn/cleanrl) for a well-tuned Atari PPO reference implementation!	2022-12-04 12:23:18 -08:00
ChenDRAG	929508ba77	Update experiment details of MuJoCo benchmark (#779 ) Update the downloading url of the training logs and saved checkpoints for MuJoCo tasks.	2022-11-26 10:18:22 -08:00
Will Dudley	b9a6d8b5f0	bugfixes: gym->gymnasium; render() update (#769 ) Credits (names from the Farama Discord): - @nrwahl2 - @APN-Pucky - chattershuts	2022-11-11 12:25:35 -08:00
Yi Su	06aaad460e	Fix a bug in loading offline data (#768 ) This PR fixes #766 . Co-authored-by: Yi Su <yi_su@apple.com>	2022-11-03 16:12:33 -07:00
fzyzcjy	7ff12b909d	Tiny change since the tests are more than unit tests (#765 ) IMHO, unit tests, compared with integration tests or end-to-end tests or other tests, often means something that only tests a single method/function/class/etc, and often has a lot of stubs and mocks so it is far from a typical/real usage scenario. On the other hand, integration tests or e2e tests mock less and are more like the real case. Tianshou says: > ... tests include the full agent training procedure for all of the implemented algorithms It seems that this is more than unit test, and falls into the category of integration or even e2e tests.	2022-11-01 07:20:20 -07:00
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Jiayi Weng	41ae3461f6	bump version to 0.4.10 (#757 ) v0.4.10	2022-10-16 22:15:20 -07:00
Zodan Jodan	0181fe79a5	fix docs tictactoc dummy vector env #669 (#749 ) a fix for #669	2022-10-03 17:41:31 -07:00
Markus Krimmel	128feb677f	Added support for new PettingZoo API (#751 )	2022-10-02 09:33:12 -07:00
Markus Krimmel	b0c8d28a7d	Added pre-commit (#752 ) - This PR adds the checks that are defined in the Makefile as pre-commit hooks. - Hopefully, the checks are equivalent to those from the Makefile, but I can't guarantee it. - CI remains as it is. - As I pointed out on discord, I experienced some conflicts between flake8 and yapf, so it might be better to transition to some other combination (e.g. black).	2022-10-02 08:57:45 -07:00
Yuge Zhang	65c4e3d4cd	Fix NNI tests upon v2.9 upgrade (#750 ) * Fix NNI tests upon v2.9 upgrade * Un-ignore * fix	2022-09-26 13:55:26 -07:00
Markus Krimmel	ea36dc5195	Changes to support Gym 0.26.0 (#748 ) * Changes to support Gym 0.26.0 * Replace map by simpler list comprehension * Use syntax that is compatible with python 3.7 * Format code * Fix environment seeding in test environment, fix buffer_profile test * Remove self.seed() from __init__ * Fix random number generation * Fix throughput tests * Fix tests * Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings * fix lint * fix * fix import * fix * fix mypy * pytest --ignore='test/3rd_party' * Use correct step API in _SetAttrWrapper * Format * Fix mypy * Format * Fix pydocstyle.	2022-09-26 09:31:23 -07:00
Jiayi Weng	278c91a222	Update citation and contributor (#721 ) * update citation * update contributor * pass lint	2022-08-10 20:06:51 -07:00
Jiayi Weng	0f59e38b12	Fix venv wrapper reset retval error with gym env (#712 ) * Fix venv wrapper reset retval error with gym env * fix lint	2022-07-31 11:00:38 -07:00
Wenhao Chen	f270e88461	Do not allow async simulation for test collector (#705 )	2022-07-22 16:23:55 -07:00
Jiayi Weng	99c99bb09a	Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695 ) * fix #689 * fix #672 * refactor RMS class * fix #688	2022-07-14 22:52:56 -07:00
Jiayi Weng	65054847ef	bump version to 0.4.9 (#684 ) v0.4.9	2022-07-05 01:07:16 +08:00
Yifei Cheng	43792bf5ab	Upgrade gym (#613 ) fixes some deprecation warnings due to new changes in gym version 0.23: - use `env.np_random.integers` instead of `env.np_random.randint` - support `seed` and `return_info` arguments for reset (addresses https://github.com/thu-ml/tianshou/issues/605)	2022-06-28 06:52:21 +08:00
Anas BELFADIL	aba2d01d25	MultiDiscrete to discrete gym action space wrapper (#664 ) Has been tested to work with DQN and a custom MultiDiscrete gym env.	2022-06-13 06:18:22 +08:00
Yifei Cheng	21b15803ac	Fix exception with watching pistonball environments (#663 )	2022-06-12 03:12:48 +08:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Yi Su	9ce0a554dc	Add Atari SAC examples (#657 ) - Add Atari (discrete) SAC examples; - Fix a bug in Discrete SAC evaluation; default to deterministic mode.	2022-06-04 13:26:08 +08:00
Jiayi Weng	5ecea2402e	Fix save_checkpoint_fn return value (#659 ) - Fix save_checkpoint_fn return value to checkpoint_path; - Fix wrong link in doc; - Fix an off-by-one bug in trainer iterator.	2022-06-03 01:07:07 +08:00
Jiayi Weng	6ad5b520fa	Fix sphinx build error (#655 )	2022-06-01 13:56:04 +08:00
Jiayi Weng	109875d43d	Fix num_envs=test_num (#653 ) * fix num_envs=test_num * fix mypy	2022-05-30 12:38:47 +08:00
Michal Gregor	277138ca5b	Added support for clipping to DQNPolicy (#642 ) * When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss. * Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;	2022-05-18 19:33:37 +08:00
Michal Gregor	c87b9f49bc	Add show_progress option for trainer (#641 ) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.	2022-05-17 23:41:59 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00

1 2 3 4 5 ...

484 Commits