Tianshou

Author	SHA1	Message	Date
Markus Krimmel	b0c8d28a7d	Added pre-commit (#752 ) - This PR adds the checks that are defined in the Makefile as pre-commit hooks. - Hopefully, the checks are equivalent to those from the Makefile, but I can't guarantee it. - CI remains as it is. - As I pointed out on discord, I experienced some conflicts between flake8 and yapf, so it might be better to transition to some other combination (e.g. black).	2022-10-02 08:57:45 -07:00
Jiayi Weng	278c91a222	Update citation and contributor (#721 ) * update citation * update contributor * pass lint	2022-08-10 20:06:51 -07:00
Jiayi Weng	65054847ef	bump version to 0.4.9 (#684 )	2022-07-05 01:07:16 +08:00
Yi Su	df35718992	Implement TD3+BC for offline RL (#660 ) - implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;	2022-06-07 00:39:37 +08:00
Anas BELFADIL	53e6b0408d	Add BranchingDQN for large discrete action spaces (#618 )	2022-05-15 21:40:32 +08:00
Jiayi Weng	2a7c151738	Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628 ) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README	2022-05-05 19:55:15 +08:00
Yi Su	dd16818ce4	implement REDQ based on original contribution by @Jimenius (#623 ) Co-authored-by: Minhui Li <limh@lamda.nju.edu.cn>	2022-05-01 00:06:00 +08:00
Jiayi Weng	18277497ed	fix py39 ci venv test failure (#593 )	2022-04-12 22:29:39 +08:00
Yi Su	2377f2f186	Implement Generative Adversarial Imitation Learning (GAIL) (#550 ) Implement GAIL based on PPO and provide example script and sample (i.e., most likely not the best) results with Mujoco tasks. (#531, #173)	2022-03-06 23:57:15 +08:00
Chengqi Duan	d85bc19269	update dqn tutorial and add envpool to docs (#526 ) Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2022-02-15 06:39:47 +08:00
Bernard Tan	bc53ead273	Implement CQLPolicy and offline_cql example (#506 )	2022-01-16 05:30:21 +08:00
Yi Su	a59d96d041	Add Intrinsic Curiosity Module (#503 )	2022-01-15 02:43:48 +08:00
Bernard Tan	5c5a3db94e	Implement BCQPolicy and offline_bcq example (#480 ) This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning). Example usage is in the examples/offline/offline_bcq.py.	2021-11-22 22:21:02 +08:00
Ayush Chaurasia	63d752ee0b	W&B: Add usage in the docs (#463 )	2021-10-13 23:28:25 +08:00
Jiayi Weng	e45e2096d8	add multi-GPU support (#461 ) add a new class DataParallelNet	2021-10-06 01:39:14 +08:00
Ayush Chaurasia	22d7bf38c8	Improve W&B logger (#441 ) - rename WandBLogger -> WandbLogger - add save_data and restore_data - allow more input arguments for wandb init - integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py - documentation update	2021-09-24 21:52:23 +08:00
Andriy Drozdyuk	8a5e2190f7	Add Weights and Biases Logger (#427 ) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>	2021-08-30 22:35:02 +08:00
Yi Su	291be08d43	Add Rainbow DQN (#386 ) - add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network	2021-08-29 23:34:59 +08:00
deeplook	728b88b92d	Fix conda install command (#419 )	2021-08-16 18:56:01 +08:00
n+e	5b7732a29b	make ppo discrete test script more general (#418 )	2021-08-15 21:37:37 +08:00
n+e	bba30f83d1	fix sb2's coverage (#412 )	2021-08-10 17:43:27 +08:00
Miguel Morales	42538f8e58	Update README.md (#410 )	2021-08-10 09:14:20 +08:00
ChenDRAG	0674ff628a	Cite Tianshou's latest paper (#406 ) * Cite Tianshou's latest paper * update new version README * change order Co-authored-by: Jiayi Weng <wengj@sea.com>	2021-08-10 08:35:01 +08:00
n+e	ebaca6f8da	add vizdoom example, bump version to 0.4.2 (#384 )	2021-06-26 18:08:41 +08:00
Yi Su	c0bc8e00ca	Add Fully-parameterized Quantile Function (#376 )	2021-06-15 11:59:02 +08:00
Yi Su	f3169b4c1f	Add Implicit Quantile Network (#371 )	2021-05-29 09:44:23 +08:00
Yi Su	8f7bc65ac7	Add discrete Critic Regularized Regression (#367 )	2021-05-19 13:29:56 +08:00
Yi Su	b5c3ddabfa	Add discrete Conservative Q-Learning for offline RL (#359 ) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com>	2021-05-12 09:24:48 +08:00
ChenDRAG	1dcf65fe21	Add NPG policy (#344 )	2021-04-21 09:52:15 +08:00
ChenDRAG	a57503c0aa	TRPO benchmark release (#340 )	2021-04-19 17:05:06 +08:00
ChenDRAG	5057b5c89e	Add TRPO policy (#337 )	2021-04-16 20:37:12 +08:00
ChenDRAG	6426a39796	ppo benchmark (#330 )	2021-03-30 11:50:35 +08:00
n+e	8963a14327	fix exception in tutorials/dqn.rst (#327 )	2021-03-26 12:57:00 +08:00
ChenDRAG	9b61bc620c	add logger (#295 ) This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;	2021-02-24 14:48:42 +08:00
ChenDRAG	7036073649	Trainer refactor : some definition change (#293 ) This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.	2021-02-21 13:06:02 +08:00
ChenDRAG	150d0ec51b	Step collector implementation (#280 ) This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>	2021-02-19 10:33:49 +08:00
wizardsheng	1eb6137645	Add QR-DQN algorithm (#276 ) This is the PR for QR-DQN algorithm: https://arxiv.org/abs/1710.10044 1. add QR-DQN policy in tianshou/policy/modelfree/qrdqn.py. 2. add QR-DQN net in examples/atari/atari_network.py. 3. add QR-DQN atari example in examples/atari/atari_qrdqn.py. 4. add QR-DQN statement in tianshou/policy/init.py. 5. add QR-DQN unit test in test/discrete/test_qrdqn.py. 6. add QR-DQN atari results in examples/atari/results/qrdqn/. 7. add compute_q_value in DQNPolicy and C51Policy for simplify forward function. 8. move `with torch.no_grad():` from `_target_q` to BasePolicy By running "python3 atari_qrdqn.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '19.8 ± 0.40', in epoch 8.	2021-01-28 09:27:05 +08:00
Jialu Zhu	a511cb4779	Add offline trainer and discrete BCQ algorithm (#263 ) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>	2021-01-20 18:13:04 +08:00
ChenDRAG	a633a6a028	update utils.network (#275 ) This is the first commit of 6 commits mentioned in #274, which features 1. Refactor of `Class Net` to support any form of MLP. 2. Enable type check in utils.network. 3. Relative change in docs/test/examples. 4. Move atari-related network to examples/atari/atari_network.py Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>	2021-01-20 16:54:13 +08:00
蔡舒起	866e35d550	fix readme (#273 )	2021-01-16 19:27:35 +08:00
wizardsheng	c6f2648e87	Add C51 algorithm (#266 ) This is the PR for C51algorithm: https://arxiv.org/abs/1707.06887 1. add C51 policy in tianshou/policy/modelfree/c51.py. 2. add C51 net in tianshou/utils/net/discrete.py. 3. add C51 atari example in examples/atari/atari_c51.py. 4. add C51 statement in tianshou/policy/__init__.py. 5. add C51 test in test/discrete/test_c51.py. 6. add C51 atari results in examples/atari/results/c51/. By running "python3 atari_c51.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '20.50 ± 0.50', in epoch 9. By running "python3 atari_c51.py --task "BreakoutNoFrameskip-v4" --n-step 1 --epoch 40", get best_reward: 407.400000 ± 31.155096 in epoch 39.	2021-01-06 10:17:45 +08:00
n+e	710966eda7	change API of train_fn and test_fn (#229 ) train_fn(epoch) -> train_fn(epoch, num_env_step) test_fn(epoch) -> test_fn(epoch, num_env_step)	2020-09-26 16:35:37 +08:00
n+e	d87d31a705	Update Anaconda support (#228 ) conda install -c conda-forge tianshou Related PR: conda-forge/staged-recipes#12719	2020-09-25 15:07:36 +08:00
Yao Feng	dcfcbb37f4	add PSRL policy (#202 ) Add PSRL policy in tianshou/policy/modelbase/psrl.py. Co-authored-by: n+e <trinkle23897@cmu.edu>	2020-09-23 20:57:33 +08:00
rocknamx	bf39b9ef7d	clarify updating state (#224 ) Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy. Meanwhile, the state of `self.training` will be undisputed in the training stage. Related issue: #211 Others: - fix a bug in DDQN: target_q could not be sampled from np.random.rand - fix a bug in DQN atari net: it should add a ReLU before the last layer - fix a bug in collector timing Co-authored-by: n+e <463003665@qq.com>	2020-09-22 16:28:46 +08:00
n+e	b86d78766b	fix docs and add docstring check (#210 ) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render	2020-09-11 07:55:37 +08:00
n+e	64af7ea839	fix critical bugs in MAPolicy and docs update (#207 ) - fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy) - polish examples/box2d/bipedal_hardcore_sac.py - several docs update - format setup.py and bump version to 0.2.7	2020-09-08 21:10:48 +08:00
n+e	8bb8ecba6e	set policy.eval() before collector.collect (#204 ) * fix #203 * no_grad argument in collector.collect	2020-09-06 16:20:16 +08:00
Trinkle23897	34f714a677	Numba acceleration (#193 ) Training FPS improvement (base commit is 94bfb32): test_pdqn: 1660 (without numba) -> 1930 discrete/test_ppo: 5100 -> 5170 since nstep has little impact on overall performance, the unit test result is: GAE: 4.1s -> 0.057s nstep: 0.3s -> 0.15s (little improvement) Others: - fix a bug in ttt set_eps - keep only sumtree in segment tree implementation - dirty fix for asyncVenv check_id test	2020-09-02 13:03:32 +08:00
yingchengyang	5b49192a48	DQN Atari examples (#187 ) This PR aims to provide the script of Atari DQN setting: - A speedrun of PongNoFrameskip-v4 (finished, about half an hour in i7-8750 + GTX1060 with 1M environment steps) - A general script for all atari game Since we use multiple env for simulation, the result is slightly different from the original paper, but consider to be acceptable. It also adds another parameter save_only_last_obs for replay buffer in order to save the memory. Co-authored-by: Trinkle23897 <463003665@qq.com>	2020-08-30 05:48:09 +08:00

1 2

100 Commits