diff --git a/README.md b/README.md index d808dac..542c8ad 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE) [![Join the chat at https://gitter.im/thu-ml/tianshou](https://badges.gitter.im/thu-ml/tianshou.svg)](https://gitter.im/thu-ml/tianshou?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -**Tianshou** (天授) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed framework and pythonic API for building the deep reinforcement learning agent. The supported interface algorithms include: +**Tianshou** ([天授]([https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88/9342](https://baike.baidu.com/item/天授/9342))) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed framework and pythonic API for building the deep reinforcement learning agent. The supported interface algorithms include: - [Policy Gradient (PG)](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) @@ -242,21 +242,6 @@ You can check out the [documentation](https://tianshou.readthedocs.io) for advan Tianshou is still under development. More algorithms and features are going to be added and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out [CONTRIBUTING.md](https://github.com/thu-ml/tianshou/blob/master/CONTRIBUTING.md). -## Citing Tianshou - -If you find Tianshou useful, please cite it in your publications. - -```latex -@misc{tianshou, - author = {Jiayi Weng, Minghao Zhang}, - title = {Tianshou}, - year = {2020}, - publisher = {GitHub}, - journal = {GitHub repository}, - howpublished = {\url{https://github.com/thu-ml/tianshou}}, -} -``` - ## TODO - [x] More examples on [mujoco, atari] benchmark @@ -267,6 +252,23 @@ If you find Tianshou useful, please cite it in your publications. - [ ] Multi-agent - [ ] Distributed training +## Citing Tianshou + +If you find Tianshou useful, please cite it in your publications. + +```latex +@misc{tianshou, + author = {Jiayi Weng, Minghao Zhang, Dong Yan, Hang Su, Jun Zhu}, + title = {Tianshou}, + year = {2020}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/thu-ml/tianshou}}, +} +``` + +We would like to thank [TSAIL](http://ml.cs.tsinghua.edu.cn/) and [Institute for Artificial Intelligence, Tsinghua University](http://ai.tsinghua.edu.cn/) for providing such an excellent AI research platform. + ## Miscellaneous Tianshou was previously a reinforcement learning platform based on TensorFlow. You can checkout the branch [`priv`](https://github.com/thu-ml/tianshou/tree/priv) for more detail. diff --git a/docs/_static/images/concepts_arch.png b/docs/_static/images/concepts_arch.png index 23b0d6a..baaa7f8 100644 Binary files a/docs/_static/images/concepts_arch.png and b/docs/_static/images/concepts_arch.png differ diff --git a/docs/tutorials/concepts.rst b/docs/tutorials/concepts.rst index a6062f2..ac81e5f 100644 --- a/docs/tutorials/concepts.rst +++ b/docs/tutorials/concepts.rst @@ -89,7 +89,7 @@ Data Buffer >>> batch_data.obs == buf[indice].obs array([ True, True, True, True]) -The :class:`~tianshou.data.ReplayBuffer` is based on ``numpy.ndarray``. Tianshou provides other type of data buffer such as :class:`~tianshou.data.ListReplayBuffer` (based on list), :class:`tianshou.data.PrioritizedReplayBuffer` (based on Segment Tree and ``numpy.ndarray``). Check out the API documentation for more detail. +The :class:`~tianshou.data.ReplayBuffer` is based on ``numpy.ndarray``. Tianshou provides other type of data buffer such as :class:`~tianshou.data.ListReplayBuffer` (based on list), :class:`~tianshou.data.PrioritizedReplayBuffer` (based on Segment Tree and ``numpy.ndarray``). Check out the API documentation for more detail. Policy