Docs: added sorting order for autogenerated toc
This commit is contained in:
parent
5af29475e8
commit
b12983622b
2
docs/.gitignore
vendored
2
docs/.gitignore
vendored
@ -1,3 +1,3 @@
|
|||||||
/api/*
|
/03_api/*
|
||||||
jupyter_execute
|
jupyter_execute
|
||||||
_toc.yml
|
_toc.yml
|
@ -308,7 +308,7 @@ Tianshou supports user-defined training code. Here is the code snippet:
|
|||||||
# train policy with a sampled batch data from buffer
|
# train policy with a sampled batch data from buffer
|
||||||
losses = policy.update(64, train_collector.buffer)
|
losses = policy.update(64, train_collector.buffer)
|
||||||
|
|
||||||
For further usage, you can refer to the :doc:`/tutorials/07_cheatsheet`.
|
For further usage, you can refer to the :doc:`/01_tutorials/07_cheatsheet`.
|
||||||
|
|
||||||
.. rubric:: References
|
.. rubric:: References
|
||||||
|
|
@ -339,7 +339,7 @@ Thus, we need a time-related interface for calculating the 2-step return. :meth:
|
|||||||
|
|
||||||
This code does not consider the done flag, so it may not work very well. It shows two ways to get :math:`s_{t + 2}` from the replay buffer easily in :meth:`~tianshou.policy.BasePolicy.process_fn`.
|
This code does not consider the done flag, so it may not work very well. It shows two ways to get :math:`s_{t + 2}` from the replay buffer easily in :meth:`~tianshou.policy.BasePolicy.process_fn`.
|
||||||
|
|
||||||
For other method, you can check out :doc:`/api/policy/index`. We give the usage of policy class a high-level explanation in :ref:`pseudocode`.
|
For other method, you can check out :doc:`/03_api/policy/index`. We give the usage of policy class a high-level explanation in :ref:`pseudocode`.
|
||||||
|
|
||||||
|
|
||||||
Collector
|
Collector
|
||||||
@ -382,7 +382,7 @@ Trainer
|
|||||||
|
|
||||||
Once you have a collector and a policy, you can start writing the training method for your RL agent. Trainer, to be honest, is a simple wrapper. It helps you save energy for writing the training loop. You can also construct your own trainer: :ref:`customized_trainer`.
|
Once you have a collector and a policy, you can start writing the training method for your RL agent. Trainer, to be honest, is a simple wrapper. It helps you save energy for writing the training loop. You can also construct your own trainer: :ref:`customized_trainer`.
|
||||||
|
|
||||||
Tianshou has three types of trainer: :func:`~tianshou.trainer.onpolicy_trainer` for on-policy algorithms such as Policy Gradient, :func:`~tianshou.trainer.offpolicy_trainer` for off-policy algorithms such as DQN, and :func:`~tianshou.trainer.offline_trainer` for offline algorithms such as BCQ. Please check out :doc:`/api/trainer/index` for the usage.
|
Tianshou has three types of trainer: :func:`~tianshou.trainer.onpolicy_trainer` for on-policy algorithms such as Policy Gradient, :func:`~tianshou.trainer.offpolicy_trainer` for off-policy algorithms such as DQN, and :func:`~tianshou.trainer.offline_trainer` for offline algorithms such as BCQ. Please check out :doc:`/03_api/trainer/index` for the usage.
|
||||||
|
|
||||||
We also provide the corresponding iterator-based trainer classes :class:`~tianshou.trainer.OnpolicyTrainer`, :class:`~tianshou.trainer.OffpolicyTrainer`, :class:`~tianshou.trainer.OfflineTrainer` to facilitate users writing more flexible training logic:
|
We also provide the corresponding iterator-based trainer classes :class:`~tianshou.trainer.OnpolicyTrainer`, :class:`~tianshou.trainer.OffpolicyTrainer`, :class:`~tianshou.trainer.OfflineTrainer` to facilitate users writing more flexible training logic:
|
||||||
::
|
::
|
@ -126,7 +126,7 @@ The figure in the right gives an intuitive comparison among synchronous/asynchro
|
|||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
The async simulation collector would cause some exceptions when used as
|
The async simulation collector would cause some exceptions when used as
|
||||||
``test_collector`` in :doc:`/api/trainer/index` (related to
|
``test_collector`` in :doc:`/03_api/trainer/index` (related to
|
||||||
`Issue 700 <https://github.com/thu-ml/tianshou/issues/700>`_). Please use
|
`Issue 700 <https://github.com/thu-ml/tianshou/issues/700>`_). Please use
|
||||||
sync version for ``test_collector`` instead.
|
sync version for ``test_collector`` instead.
|
||||||
|
|
||||||
@ -478,4 +478,4 @@ By constructing a new state ``state_ = (state, agent_id, mask)``, essentially we
|
|||||||
act = policy(state_)
|
act = policy(state_)
|
||||||
next_state_, reward = env.step(act)
|
next_state_, reward = env.step(act)
|
||||||
|
|
||||||
Following this idea, we write a tiny example of playing `Tic Tac Toe <https://en.wikipedia.org/wiki/Tic-tac-toe>`_ against a random player by using a Q-learning algorithm. The tutorial is at :doc:`/tutorials/04_tictactoe`.
|
Following this idea, we write a tiny example of playing `Tic Tac Toe <https://en.wikipedia.org/wiki/Tic-tac-toe>`_ against a random player by using a Q-learning algorithm. The tutorial is at :doc:`/01_tutorials/04_tictactoe`.
|
@ -368,7 +368,7 @@
|
|||||||
"id": "8Oc1p8ud9kcu"
|
"id": "8Oc1p8ud9kcu"
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"Would like to learn more advanced usages of Batch? Feel curious about how data is organized inside the Batch? Check the [documentation](https://tianshou.readthedocs.io/en/master/api/tianshou.data.html) and other [tutorials](https://tianshou.readthedocs.io/en/master/tutorials/batch.html#) for more details."
|
"Would like to learn more advanced usages of Batch? Feel curious about how data is organized inside the Batch? Check the [documentation](https://tianshou.readthedocs.io/en/master/03_api/tianshou.data.html) and other [tutorials](https://tianshou.readthedocs.io/en/master/tutorials/batch.html#) for more details."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
@ -61,19 +61,19 @@ Test by GitHub Actions
|
|||||||
|
|
||||||
1. Click the ``Actions`` button in your own repo:
|
1. Click the ``Actions`` button in your own repo:
|
||||||
|
|
||||||
.. image:: _static/images/action1.jpg
|
.. image:: ../_static/images/action1.jpg
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
2. Click the green button:
|
2. Click the green button:
|
||||||
|
|
||||||
.. image:: _static/images/action2.jpg
|
.. image:: ../_static/images/action2.jpg
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
3. You will see ``Actions Enabled.`` on the top of html page.
|
3. You will see ``Actions Enabled.`` on the top of html page.
|
||||||
|
|
||||||
4. When you push a new commit to your own repo (e.g. ``git push``), it will automatically run the test in this page:
|
4. When you push a new commit to your own repo (e.g. ``git push``), it will automatically run the test in this page:
|
||||||
|
|
||||||
.. image:: _static/images/action3.png
|
.. image:: ../_static/images/action3.png
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
|
|
@ -52,7 +52,7 @@ Here is Tianshou's other features:
|
|||||||
* Support any type of environment state/action (e.g. a dict, a self-defined class, ...): :ref:`self_defined_env`
|
* Support any type of environment state/action (e.g. a dict, a self-defined class, ...): :ref:`self_defined_env`
|
||||||
* Support :ref:`customize_training`
|
* Support :ref:`customize_training`
|
||||||
* Support n-step returns estimation :meth:`~tianshou.policy.BasePolicy.compute_nstep_return` and prioritized experience replay :class:`~tianshou.data.PrioritizedReplayBuffer` for all Q-learning based algorithms; GAE, nstep and PER are very fast thanks to numba jit function and vectorized numpy operation
|
* Support n-step returns estimation :meth:`~tianshou.policy.BasePolicy.compute_nstep_return` and prioritized experience replay :class:`~tianshou.data.PrioritizedReplayBuffer` for all Q-learning based algorithms; GAE, nstep and PER are very fast thanks to numba jit function and vectorized numpy operation
|
||||||
* Support :doc:`/tutorials/04_tictactoe`
|
* Support :doc:`/01_tutorials/04_tictactoe`
|
||||||
* Support both `TensorBoard <https://www.tensorflow.org/tensorboard>`_ and `W&B <https://wandb.ai/>`_ log tools
|
* Support both `TensorBoard <https://www.tensorflow.org/tensorboard>`_ and `W&B <https://wandb.ai/>`_ log tools
|
||||||
* Support multi-GPU training :ref:`multi_gpu`
|
* Support multi-GPU training :ref:`multi_gpu`
|
||||||
* Comprehensive `unit tests <https://github.com/thu-ml/tianshou/actions>`_, including functional checking, RL pipeline checking, documentation checking, PEP8 code-style checking, and type checking
|
* Comprehensive `unit tests <https://github.com/thu-ml/tianshou/actions>`_, including functional checking, RL pipeline checking, documentation checking, PEP8 code-style checking, and type checking
|
||||||
|
Loading…
x
Reference in New Issue
Block a user