Tianshou

Author	SHA1	Message	Date
Yi Su	06aaad460e	Fix a bug in loading offline data (#768 ) This PR fixes #766 . Co-authored-by: Yi Su <yi_su@apple.com>	2022-11-03 16:12:33 -07:00
fzyzcjy	7ff12b909d	Tiny change since the tests are more than unit tests (#765 ) IMHO, unit tests, compared with integration tests or end-to-end tests or other tests, often means something that only tests a single method/function/class/etc, and often has a lot of stubs and mocks so it is far from a typical/real usage scenario. On the other hand, integration tests or e2e tests mock less and are more like the real case. Tianshou says: > ... tests include the full agent training procedure for all of the implemented algorithms It seems that this is more than unit test, and falls into the category of integration or even e2e tests.	2022-11-01 07:20:20 -07:00
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Jiayi Weng	41ae3461f6	bump version to 0.4.10 (#757 ) v0.4.10	2022-10-16 22:15:20 -07:00
Zodan Jodan	0181fe79a5	fix docs tictactoc dummy vector env #669 (#749 ) a fix for #669	2022-10-03 17:41:31 -07:00
Markus Krimmel	128feb677f	Added support for new PettingZoo API (#751 )	2022-10-02 09:33:12 -07:00
Markus Krimmel	b0c8d28a7d	Added pre-commit (#752 ) - This PR adds the checks that are defined in the Makefile as pre-commit hooks. - Hopefully, the checks are equivalent to those from the Makefile, but I can't guarantee it. - CI remains as it is. - As I pointed out on discord, I experienced some conflicts between flake8 and yapf, so it might be better to transition to some other combination (e.g. black).	2022-10-02 08:57:45 -07:00
Yuge Zhang	65c4e3d4cd	Fix NNI tests upon v2.9 upgrade (#750 ) * Fix NNI tests upon v2.9 upgrade * Un-ignore * fix	2022-09-26 13:55:26 -07:00
Markus Krimmel	ea36dc5195	Changes to support Gym 0.26.0 (#748 ) * Changes to support Gym 0.26.0 * Replace map by simpler list comprehension * Use syntax that is compatible with python 3.7 * Format code * Fix environment seeding in test environment, fix buffer_profile test * Remove self.seed() from __init__ * Fix random number generation * Fix throughput tests * Fix tests * Removed done field from Buffer, fixed throughput test, turned off wandb, fixed formatting, fixed type hints, allow preprocessing_fn with truncated and terminated arguments, updated docstrings * fix lint * fix * fix import * fix * fix mypy * pytest --ignore='test/3rd_party' * Use correct step API in _SetAttrWrapper * Format * Fix mypy * Format * Fix pydocstyle.	2022-09-26 09:31:23 -07:00
Jiayi Weng	278c91a222	Update citation and contributor (#721 ) * update citation * update contributor * pass lint	2022-08-10 20:06:51 -07:00
Jiayi Weng	0f59e38b12	Fix venv wrapper reset retval error with gym env (#712 ) * Fix venv wrapper reset retval error with gym env * fix lint	2022-07-31 11:00:38 -07:00
Wenhao Chen	f270e88461	Do not allow async simulation for test collector (#705 )	2022-07-22 16:23:55 -07:00
Jiayi Weng	99c99bb09a	Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695 ) * fix #689 * fix #672 * refactor RMS class * fix #688	2022-07-14 22:52:56 -07:00
Jiayi Weng	65054847ef	bump version to 0.4.9 (#684 ) v0.4.9	2022-07-05 01:07:16 +08:00
Yifei Cheng	43792bf5ab	Upgrade gym (#613 ) fixes some deprecation warnings due to new changes in gym version 0.23: - use `env.np_random.integers` instead of `env.np_random.randint` - support `seed` and `return_info` arguments for reset (addresses https://github.com/thu-ml/tianshou/issues/605)	2022-06-28 06:52:21 +08:00
Anas BELFADIL	aba2d01d25	MultiDiscrete to discrete gym action space wrapper (#664 ) Has been tested to work with DQN and a custom MultiDiscrete gym env.	2022-06-13 06:18:22 +08:00
Yifei Cheng	21b15803ac	Fix exception with watching pistonball environments (#663 )	2022-06-12 03:12:48 +08:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Yi Su	9ce0a554dc	Add Atari SAC examples (#657 ) - Add Atari (discrete) SAC examples; - Fix a bug in Discrete SAC evaluation; default to deterministic mode.	2022-06-04 13:26:08 +08:00
Jiayi Weng	5ecea2402e	Fix save_checkpoint_fn return value (#659 ) - Fix save_checkpoint_fn return value to checkpoint_path; - Fix wrong link in doc; - Fix an off-by-one bug in trainer iterator.	2022-06-03 01:07:07 +08:00
Jiayi Weng	6ad5b520fa	Fix sphinx build error (#655 )	2022-06-01 13:56:04 +08:00
Jiayi Weng	109875d43d	Fix num_envs=test_num (#653 ) * fix num_envs=test_num * fix mypy	2022-05-30 12:38:47 +08:00
Michal Gregor	277138ca5b	Added support for clipping to DQNPolicy (#642 ) * When clip_loss_grad=True is passed, Huber loss is used instead of the MSE loss. * Made the argument's name more descriptive; * Replaced the smooth L1 loss with the Huber loss, which has an identical form to the default parametrization, but seems to be better known in this context; * Added a fuller description to the docstring;	2022-05-18 19:33:37 +08:00
Michal Gregor	c87b9f49bc	Add show_progress option for trainer (#641 ) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.	2022-05-17 23:41:59 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00
Jiayi Weng	a03f19af72	fix pytest error on non-linux system (#638 )	2022-05-12 20:52:55 +08:00
Jiayi Weng	bf8f63ffc3	use envpool in vizdoom example, update doc (#634 )	2022-05-09 00:42:16 +08:00
Jiayi Weng	2a7c151738	Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628 ) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README v0.4.8	2022-05-05 19:55:15 +08:00
Yi Su	a7c789f851	Improve data loading from D4RL and convert RL Unplugged to D4RL format (#624 )	2022-05-04 04:37:52 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
Yi Su	41afc2584a	Convert RL Unplugged Atari datasets to tianshou ReplayBuffer (#621 )	2022-04-29 19:33:28 +08:00
ChenDRAG	7f23748347	Compare Atari results with dopamine and OpenAI Baselines (#616 )	2022-04-27 21:10:45 +08:00
Jiayi Weng	876e6b186e	hot fix mujoco benchmark	2022-04-24 16:49:40 -04:00
Chengqi Duan	5eab7dc218	Add Atari Results (#600 )	2022-04-24 20:44:54 +08:00
ChenDRAG	5c9afe72f3	Update Mujoco Bemchmark's webpage (#606 )	2022-04-24 01:11:33 +08:00
Squeemos	e01385ea30	Change action_dim to action_shape (#602 ) Noticed that in IQN and FQF there were some mismatches in the docstrings. Figured I would make a pull request to make it match.	2022-04-22 08:09:57 +08:00
ChenDRAG	57ecebde38	Add jupyter notebook tutorials using Google Colaboratory (#599 )	2022-04-19 20:58:52 +08:00
Alex Nikulkov	92456cdb68	Add learning rate scheduler to BasePolicy (#598 )	2022-04-17 23:52:30 +08:00
Yifei Cheng	6fc6857812	Update Multi-agent RL docs, upgrade pettingzoo (#595 ) * update multi-agent docs, upgrade pettingzoo * avoid pettingzoo deprecation warning * fix pistonball tests * codestyle	2022-04-16 23:17:53 +08:00
Jiayi Weng	18277497ed	fix py39 ci venv test failure (#593 )	2022-04-12 22:29:39 +08:00
ChenDRAG	75d7c9f1d9	Fix action scaling bug in SAC (#591 ) close #588	2022-04-12 00:26:06 +08:00
Jiayi Weng	f13e415eb0	Add write_flush in two loggers, fix argument passing in WandbLogger (#581 )	2022-03-30 08:04:23 +08:00
Jiayi Weng	6ab9860183	fix negative collector time (#578 )	2022-03-26 10:44:08 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper. v0.4.7	2022-03-22 04:29:27 +08:00
Jose Antonio Martin H	10d919052b	Add Trainers as generators (#559 ) The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-18 00:26:14 +08:00
Andrea Boscolo Camiletto	2336a7db1b	fixed typo in rainbow DQN paper reference (#569 ) * fixed typo in rainbow DQN paper ref * fix gym==0.23 ci failure Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-16 21:38:51 +08:00
Minhui Li	39f8391cfb	Add map_action_inverse for fixing error of storing random action (#568 ) (Issue #512) Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when the action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods. This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.	2022-03-12 22:26:00 +08:00
Yi Su	9cb74e60c9	Add imitation baselines for offline RL (#566 ) add imitation baselines for offline RL; make the choice of env/task and D4RL dataset explicit; on expert datasets, IL easily outperforms; after reading the D4RL paper, I'll rerun the exps on medium data	2022-03-12 21:33:54 +08:00
Alex Nikulkov	74f430ea36	Add a comment before SAC alpha loss (#565 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-09 06:38:42 +08:00
Chengqi Duan	ad2e1eaea0	Fix WandbLogger import error in Atari examples (#562 )	2022-03-08 08:38:56 -05:00

... 2 3 4 5 6 ...

509 Commits