2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								.. Tianshou documentation master file, created by 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   sphinx-quickstart on Sat Mar 28 15:58:19 2020.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   You can adapt this file completely to your liking, but it should at least
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   contain the root `toctree`   directive.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Welcome to Tianshou!
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								====================
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-04-02 09:07:04 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								**Tianshou**  (`天授  <https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88> `_ ) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed framework and pythonic API for building the deep reinforcement learning agent. The supported interface algorithms include:
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-04-06 19:36:59 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.PGPolicy`  `Policy Gradient  <https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DQNPolicy`  `Deep Q-Network  <https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 14:42:08 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DQNPolicy`  `Double DQN  <https://arxiv.org/pdf/1509.06461.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-07-29 19:44:42 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DQNPolicy`  `Dueling DQN  <https://arxiv.org/pdf/1511.06581.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2021-01-06 10:17:45 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.C51Policy`  `C51  <https://arxiv.org/pdf/1707.06887.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2021-01-28 09:27:05 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.QRDQNPolicy`  `Quantile Regression DQN  <https://arxiv.org/pdf/1710.10044.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-06 19:36:59 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.A2CPolicy`  `Advantage Actor-Critic  <https://openai.com/blog/baselines-acktr-a2c/> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DDPGPolicy`  `Deep Deterministic Policy Gradient  <https://arxiv.org/pdf/1509.02971.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.PPOPolicy`  `Proximal Policy Optimization  <https://arxiv.org/pdf/1707.06347.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.TD3Policy`  `Twin Delayed DDPG  <https://arxiv.org/pdf/1802.09477.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.SACPolicy`  `Soft Actor-Critic  <https://arxiv.org/pdf/1812.05905.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-22 16:28:46 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DiscreteSACPolicy`  `Discrete Soft Actor-Critic  <https://arxiv.org/pdf/1910.07207.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-14 21:11:06 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.ImitationPolicy`  Imitation Learning
 
							 
						 
					
						
							
								
									
										
										
										
											2021-01-20 02:13:04 -08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.DiscreteBCQPolicy`  `Discrete Batch-Constrained deep Q-Learning  <https://arxiv.org/pdf/1910.01708.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.policy.PSRLPolicy`  `Posterior Sampling Reinforcement Learning  <https://www.ece.uvic.ca/~bctill/papers/learning/Strens_2000.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-05-27 11:02:23 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  :class: `~tianshou.data.PrioritizedReplayBuffer`  `Prioritized Experience Replay  <https://arxiv.org/pdf/1511.05952.pdf> `_ 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :meth: `~tianshou.policy.BasePolicy.compute_episodic_return`  `Generalized Advantage Estimator  <https://arxiv.org/pdf/1506.02438.pdf> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-06-03 17:04:26 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Here is Tianshou's other features:
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-06-03 17:04:26 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Elegant framework, using only ~2000 lines of code
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-08 21:10:48 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Support parallel environment simulation (synchronous or asynchronous) for all algorithms: :ref: `parallel_sampling` 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Support recurrent state representation in actor network and critic network (RNN-style training for POMDP): :ref: `rnn_training` 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  Support any type of environment state/action (e.g. a dict, a self-defined class, ...): :ref: `self_defined_env` 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  Support :ref: `customize_training` 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-08 21:10:48 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Support n-step returns estimation :meth: `~tianshou.policy.BasePolicy.compute_nstep_return`  and prioritized experience replay :class: `~tianshou.data.PrioritizedReplayBuffer`  for all Q-learning based algorithms; GAE, nstep and PER are very fast thanks to numba jit function and vectorized numpy operation
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Support :doc: `/tutorials/tictactoe` 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-09-13 19:31:50 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								*  Comprehensive `unit tests  <https://github.com/thu-ml/tianshou/actions> `_ , including functional checking, RL pipeline checking, documentation checking, PEP8 code-style checking, and type checking
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 14:42:08 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								中文文档位于 `https://tianshou.readthedocs.io/zh/latest/  <https://tianshou.readthedocs.io/zh/latest/> `_ 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-02 09:07:04 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Installation
 
							 
						 
					
						
							
								
									
										
										
										
											2020-06-02 08:51:14 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								------------
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-25 15:07:36 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Tianshou is currently hosted on `PyPI  <https://pypi.org/project/tianshou/> `_  and `conda-forge  <https://github.com/conda-forge/tianshou-feedstock> `_ . It requires Python >= 3.6.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								You can simply install Tianshou from PyPI with the following command:
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 14:42:08 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								..  code-block ::  bash
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    $ pip install tianshou
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-25 15:07:36 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								If you use Anaconda or Miniconda, you can install Tianshou from conda-forge through the following command:
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 14:42:08 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								..  code-block ::  bash
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-25 15:07:36 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    $ conda -c conda-forge install tianshou
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-25 15:07:36 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								You can also install with the newest version through GitHub:
 
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 14:42:08 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								..  code-block ::  bash
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-28 20:56:02 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-25 15:07:36 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    $ pip install git+https://github.com/thu-ml/tianshou.git@master --upgrade
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-28 20:56:02 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								After installation, open your python console and type
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								::
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-08 21:10:48 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								    import tianshou
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								    print(tianshou.__version__)
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								If no error occurs, you have successfully installed Tianshou.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-06 16:20:16 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								Tianshou is still under development, you can also check out the documents in stable version through `tianshou.readthedocs.io/en/stable/  <https://tianshou.readthedocs.io/en/stable/> `_ .
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								..  toctree :: 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   :maxdepth:  1 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   :caption:  Tutorials 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 15:18:33 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   tutorials/dqn
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   tutorials/concepts
 
							 
						 
					
						
							
								
									
										
										
										
											2020-07-19 15:20:35 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   tutorials/batch
 
							 
						 
					
						
							
								
									
										
										
										
											2020-07-21 14:59:49 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   tutorials/tictactoe
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-10 11:16:33 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   tutorials/trick
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-10 10:47:16 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   tutorials/cheatsheet
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								..  toctree :: 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   :maxdepth:  1 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   :caption:  API Docs 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-04-02 09:07:04 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   api/tianshou.data
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   api/tianshou.env
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   api/tianshou.policy
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   api/tianshou.trainer
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   api/tianshou.exploration
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   api/tianshou.utils
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-09-11 07:55:37 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								..  toctree :: 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   :maxdepth:  1 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								   :caption:  Community 
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-29 10:22:03 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   contributing
 
							 
						 
					
						
							
								
									
										
										
										
											2020-04-11 19:29:46 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								   contributor
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								Indices and tables
 
							 
						 
					
						
							
								
									
										
										
										
											2020-06-02 08:51:14 +08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
							
								------------------
 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-28 22:01:23 +08:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :ref: `genindex` 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :ref: `modindex` 
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
							
								*  :ref: `search`