Tianshou

Author	SHA1	Message	Date
Will Dudley	b9a6d8b5f0	bugfixes: gym->gymnasium; render() update (#769 ) Credits (names from the Farama Discord): - @nrwahl2 - @APN-Pucky - chattershuts	2022-11-11 12:25:35 -08:00
Juno T	d42a5fb354	Hindsight Experience Replay as a replay buffer (#753 ) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png)	2022-10-30 16:54:54 -07:00
Zodan Jodan	0181fe79a5	fix docs tictactoc dummy vector env #669 (#749 ) a fix for #669	2022-10-03 17:41:31 -07:00
Markus Krimmel	b0c8d28a7d	Added pre-commit (#752 ) - This PR adds the checks that are defined in the Makefile as pre-commit hooks. - Hopefully, the checks are equivalent to those from the Makefile, but I can't guarantee it. - CI remains as it is. - As I pointed out on discord, I experienced some conflicts between flake8 and yapf, so it might be better to transition to some other combination (e.g. black).	2022-10-02 08:57:45 -07:00
Jiayi Weng	278c91a222	Update citation and contributor (#721 ) * update citation * update contributor * pass lint	2022-08-10 20:06:51 -07:00
Wenhao Chen	f270e88461	Do not allow async simulation for test collector (#705 )	2022-07-22 16:23:55 -07:00
Jiayi Weng	99c99bb09a	Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695 ) * fix #689 * fix #672 * refactor RMS class * fix #688	2022-07-14 22:52:56 -07:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Jiayi Weng	5ecea2402e	Fix save_checkpoint_fn return value (#659 ) - Fix save_checkpoint_fn return value to checkpoint_path; - Fix wrong link in doc; - Fix an off-by-one bug in trainer iterator.	2022-06-03 01:07:07 +08:00
Jiayi Weng	6ad5b520fa	Fix sphinx build error (#655 )	2022-06-01 13:56:04 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00
Jiayi Weng	bf8f63ffc3	use envpool in vizdoom example, update doc (#634 )	2022-05-09 00:42:16 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
ChenDRAG	7f23748347	Compare Atari results with dopamine and OpenAI Baselines (#616 )	2022-04-27 21:10:45 +08:00
Jiayi Weng	876e6b186e	hot fix mujoco benchmark	2022-04-24 16:49:40 -04:00
Chengqi Duan	5eab7dc218	Add Atari Results (#600 )	2022-04-24 20:44:54 +08:00
ChenDRAG	5c9afe72f3	Update Mujoco Bemchmark's webpage (#606 )	2022-04-24 01:11:33 +08:00
ChenDRAG	57ecebde38	Add jupyter notebook tutorials using Google Colaboratory (#599 )	2022-04-19 20:58:52 +08:00
Alex Nikulkov	92456cdb68	Add learning rate scheduler to BasePolicy (#598 )	2022-04-17 23:52:30 +08:00
Yifei Cheng	6fc6857812	Update Multi-agent RL docs, upgrade pettingzoo (#595 ) * update multi-agent docs, upgrade pettingzoo * avoid pettingzoo deprecation warning * fix pistonball tests * codestyle	2022-04-16 23:17:53 +08:00
Jiayi Weng	2a9c9289e5	rename save_fn to save_best_fn to avoid ambiguity (#575 ) This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.	2022-03-22 04:29:27 +08:00
Jose Antonio Martin H	10d919052b	Add Trainers as generators (#559 ) The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-18 00:26:14 +08:00
Andrea Boscolo Camiletto	2336a7db1b	fixed typo in rainbow DQN paper reference (#569 ) * fixed typo in rainbow DQN paper ref * fix gym==0.23 ci failure Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-16 21:38:51 +08:00
Costa Huang	df3d7f582b	Update WandbLogger implementation (#558 ) * Use `global_step` as the x-axis for wandb * Use Tensorboard SummaryWritter as core with `wandb.init(..., sync_tensorboard=True)` * Update all atari examples with wandb Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-03-07 06:40:47 +08:00
Yi Su	2377f2f186	Implement Generative Adversarial Imitation Learning (GAIL) (#550 ) Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)	2022-03-06 23:57:15 +08:00
Chengqi Duan	d85bc19269	update dqn tutorial and add envpool to docs (#526 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-15 06:39:47 +08:00
Chengqi Duan	9c100e0705	Enable venvs.reset() concurrent execution (#517 ) - change the internal API name of worker: send_action -> send, get_result -> recv (align with envpool) - add a timing test for venvs.reset() to make sure the concurrent execution - change venvs.reset() logic Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-08 00:40:01 +08:00
ChenDRAG	c25926dd8f	Formalize variable names (#509 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-01-30 00:53:56 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
n+e	fc251ab0b8	bump to v0.4.3 (#432 ) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check	2021-09-03 05:05:04 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
n+e	e4f4f0e144	fix docs build failure and a bug in a2c/ppo optimizer (#428 ) * fix rtfd build * list + list -> set.union * change seed of test_qrdqn * add py39 test	2021-08-30 02:07:03 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
Andriy Drozdyuk	d161059c3d	Replaced indice by plural indices (#422 )	2021-08-20 21:58:44 +08:00
n+e	c19876179a	add env_id in preprocess fn (#391 )	2021-07-05 09:50:39 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
n+e	458028a326	fix docs (#373 ) - fix css style error - fix mujoco benchmark result	2021-05-23 12:43:03 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
Ark	84f58636eb	Make trainer resumable (#350 ) - specify tensorboard >= 2.5.0 - add `save_checkpoint_fn` and `resume_from_log` in trainer Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-05-06 08:53:53 +08:00
n+e	ff4d3cd714	Support different state size and fix exception in venv.__del__ (#352 ) - Batch: do not raise error when it finds list of np.array with different shape[0]. - Venv's obs: add try...except block for np.stack(obs_list) - remove venv.__del__ since it is buggy	2021-04-25 15:23:46 +08:00
ChenDRAG	bbc3c3e32d	Add numerical analysis tool and interactive plot (#341 ) Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-04-22 12:49:54 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00

1 2 3

128 Commits