add baseline and rlpyt result

2020-03-27 16:24:07 +08:00 · 2020-03-27 16:24:07 +08:00 · 044aae4355
commit 044aae4355
parent 44f911bc31
6 changed files with 120 additions and 24 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -0,0 +1,16 @@
+- [ ] I have marked all applicable categories:
+    + [ ] exception-raising bug
+    + [ ] RL algorithm bug
+    + [ ] documentation request (i.e. "X is missing from the documentation.")
+    + [ ] new feature request
+- [ ] I have visited the [source website], and in particular read the [known issues]
+- [ ] I have searched through the [issue tracker] for duplicates
+- [ ] I have mentioned version numbers, operating system and environment, where applicable:
+  ```python
+  import tianshou, sys
+  print(tianshou.__version__, sys.version, sys.platform)
+  ```
+
+  [source website]: https://github.com/thu-ml/tianshou/
+  [known issues]: https://github.com/thu-ml/tianshou/#faq-and-known-issues
+  [issue tracker]: https://github.com/thu-ml/tianshou/issues?q=
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -0,0 +1,20 @@
+- [ ] I have marked all applicable categories:
+    + [ ] exception-raising fix
+    + [ ] algorithm implementation fix
+    + [ ] documentation modification
+    + [ ] new feature
+- [ ] If applicable, I have mentioned the relevant/related issue(s)
+
+Less important but also useful:
+
+- [ ] I have visited the [source website], and in particular read the [known issues]
+- [ ] I have searched through the [issue tracker] for duplicates
+- [ ] I have mentioned version numbers, operating system and environment, where applicable:
+  ```python
+  import tianshou, sys
+  print(tianshou.__version__, sys.version, sys.platform)
+  ```
+
+  [source website]: https://github.com/thu-ml/tianshou
+  [known issues]: https://github.com/thu-ml/tianshou/#faq-and-known-issues
+  [issue tracker]: https://github.com/thu-ml/tianshou/issues?q=
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,43 @@
+# Contributing
+
+To install Tianshou in an "editable" mode, run
+```bash
+pip install -e .
+```
+in the main directory. This installation is removable by
+```bash
+python setup.py develop --uninstall
+```
+Additional dependencies for developments can be installed by
+```bash
+pip install ".[dev]"
+```
+
+#### Tests
+
+This command will run automatic tests in the main directory
+```python
+pytest test --cov tianshou -s
+```
+
+##### PEP8 Code Style Check
+
+We follow PEP8 python code style. To check, in the main directory, run:
+```python
+flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
+flake8 . --count --exit-zero --max-complexity=30 --max-line-length=79 --statistics
+```
+
+#### Documents
+
+Documents are written under the `docs/` directory as RestructuredText (`.rst`) files. `index.rst` is the main page. A Tutorial on RestructuredText can be found [here](https://pythonhosted.org/an_example_pypi_project/sphinx.html).
+
+API References are automatically generated by [Sphinx](http://www.sphinx-doc.org/en/stable/) according to the outlines under 
+`doc/api/` and should be modified when any code changes.
+
+To compile docs into webpages, Run
+```
+make html
+```
+under the `docs/` directory. The generated webpages are in `docs/_build` and
+can be viewed with browsers.
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
 <h1 align="center">Tianshou</h1>

 ![PyPI](https://img.shields.io/pypi/v/tianshou)
-![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg)
+![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)
 [![Documentation Status](https://readthedocs.org/projects/tianshou/badge/?version=latest)](https://tianshou.readthedocs.io/en/latest/?badge=latest)
 [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network)
@ -21,9 +21,9 @@
 - [Twin Delayed DDPG (TD3)](https://arxiv.org/pdf/1802.09477.pdf)
 - [Soft Actor-Critic (SAC)](https://arxiv.org/pdf/1812.05905.pdf)

-Tianshou supports parallel environment training for all algorithms as well.
+Tianshou supports parallel workers for all algorithms as well. All of these algorithms are reformatted as replay-buffer based algorithms.

-Tianshou is still under development. More algorithms are going to be added and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out the [guidelines](/CONTRIBUTING.md).
+Tianshou is still under development. More algorithms are going to be added and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out [CONTRIBUTING.md](/CONTRIBUTING.md).

 ## Installation

@ -35,7 +35,7 @@ pip3 install tianshou

 ## Documentation

-The tutorials and API documentation are hosted on [https://tianshou.readthedocs.io](https://tianshou.readthedocs.io).
+The tutorials and API documentation are hosted on [https://tianshou.readthedocs.io](https://tianshou.readthedocs.io). It is under construction currently.

 The example scripts are under [test/discrete](/test/discrete) (CartPole) and [test/continuous](/test/continuous) (Pendulum).

@ -47,21 +47,23 @@ Tianshou is a lightweight but high-speed reinforcement learning platform. For ex

 ![testpg](docs/_static/images/testpg.gif)

-We select some of famous (>1k stars) reinforcement learning platform. Here is the benchmark result for other algorithms and platforms on toy scenarios:
+We select some of famous (>1k stars) reinforcement learning platforms. Here is the benchmark result for other algorithms and platforms on toy scenarios:

-| Platform           | [Tianshou](https://github.com/thu-ml/tianshou)               | [Baselines](https://github.com/openai/baselines)             | [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [PyTorch DRL](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) | [rlpyt](https://github.com/astooke/rlpyt)                    |
-| ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
-| GitHub Stars       | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) |
-| Algo \ ML platform | PyTorch                                                      | TensorFlow                                                   | TF/PyTorch                                                   | PyTorch                                                      | PyTorch                                                      |
-| PG - CartPole      | 9.03±4.18s                                                   |                                                              |                                                              | None                                                         |                                                              |
-| DQN - CartPole     | 20.94±11.38s                                                 |                                                              |                                                              | 175.55±53.81s                                                |                                                              |
-| A2C - CartPole     | 11.72±3.85s                                                  |                                                              |                                                              | Error                                                        |                                                              |
-| PPO - CartPole     | 35.25±16.47s                                                 |                                                              |                                                              | 29.16±15.46s                                                 |                                                              |
-| DDPG - Pendulum    | 46.95±24.31s                                                 |                                                              |                                                              | 652.83±471.28s                                               |                                                              |
-| SAC - Pendulum     | 38.92±2.09s                                                  | None                                                         |                                                              | 808.21±405.70s                                               |                                                              |
-| TD3 - Pendulum     | 48.39±7.22s                                                  | None                                                         |                                                              | 619.33±324.97s                                               |                                                              |
+| RL Platform      | [Tianshou](https://github.com/thu-ml/tianshou)               | [Baselines](https://github.com/openai/baselines)             | [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [PyTorch DRL](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) | [rlpyt](https://github.com/astooke/rlpyt)                    |
+| ---------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| GitHub Stars     | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) |
+| Algo \ Framework | PyTorch                                                      | TensorFlow                                                   | TF/PyTorch                                                   | PyTorch                                                      | PyTorch                                                      |
+| PG - CartPole    | 9.03±4.18s                                                   | None                                                         |                                                              | None                                                         |                                                              |
+| DQN - CartPole   | 20.94±11.38s                                                 | 1046.34±291.27s                                              |                                                              | 175.55±53.81s                                                |                                                              |
+| A2C - CartPole   | 11.72±3.85s                                                  | *(~1612s)                                                    |                                                              | Runtime Error                                                |                                                              |
+| PPO - CartPole   | 35.25±16.47s                                                 | *(~1179s)                                                    |                                                              | 29.16±15.46s                                                 |                                                              |
+| DDPG - Pendulum  | 46.95±24.31s                                                 | *(>1h)                                                       |                                                              | 652.83±471.28s                                               | 172.18±62.48s                                                |
+| TD3 - Pendulum   | 48.39±7.22s                                                  | None                                                         |                                                              | 619.33±324.97s                                               | 210.31±76.30s                                                |
+| SAC - Pendulum   | 38.92±2.09s                                                  | None                                                         |                                                              | 808.21±405.70s                                               | 295.92±140.85s                                               |

-All of the platforms use at most 10 different seeds for testing. We erase those trials which failed for training. The reward threshold is 195.0 in CartPole and -250.0 in Pendulum over consecutive 100 episodes. 
+*: Could not reach the target reward threshold in 1e6 steps in any of 10 runs. The total runtime is in the brackets.
+
+All of the platforms use 10 different seeds for testing. We erase those trials which failed for training. The reward threshold is 195.0 in CartPole and -250.0 in Pendulum over consecutive 100 episodes' mean returns. 

 ### Reproducible

@ -71,10 +73,16 @@ Check out the [GitHub Actions](https://github.com/thu-ml/tianshou/actions) page

 ### Elegant and Flexible

-Currently, the overall code of Tianshou platform is less than 1500 lines. It is quite easy to go through the framework and understand how it works. We provide many flexible API as you wish, for instance, if you want to use your policy to interact with environment with `n` episodes:
+Currently, the overall code of Tianshou platform is less than 1500 lines. Most of the implemented algorithms are less than 100 lines of python code. It is quite easy to go through the framework and understand how it works. We provide many flexible API as you wish, for instance, if you want to use your policy to interact with environment with `n` steps:

 ```python
-result = collector.collect(n_episode=n)
+result = collector.collect(n_step=n)
+```
+
+If you have 3 environment in total and want to collect 1 episode in the first environment, 3 for third environment:
+
+```python
+result = collector.collect(n_episode=[1, 0, 3])
 ```

 If you want to train the given policy with a sampled batch:
@ -116,7 +124,7 @@ batch_size = 64
 train_num = 8 
 test_num = 100 
 device = 'cuda' if torch.cuda.is_available() else 'cpu'
-writer = SummaryWriter('log')  # tensorboard is also supported!
+writer = SummaryWriter('log/pg')  # tensorboard is also supported!
 ```

 Define the network:
@ -191,7 +199,7 @@ collecter.collect(n_episode=1, render=1/35)
 Looking at the result saved in tensorboard: (on bash script)

 ```bash
-tensorboard --logdir log
+tensorboard --logdir log/pg
 ```

 ## Citing Tianshou
@ -214,6 +222,7 @@ If you find Tianshou useful, please cite it in your publications.
 - [ ] More examples on [mujoco, atari] benchmark
 - [ ] Prioritized replay buffer
 - [ ] RNN support
+- [ ] Imitation Learning
 - [ ] Multi-agent
 - [ ] Distributed training

--- a/setup.py
+++ b/setup.py
@ -5,7 +5,7 @@ from setuptools import setup, find_packages

 setup(
    name='tianshou',
-    version='0.2.0',
+    version='0.2.0post1',
    description='A Library for Deep Reinforcement Learning',
    long_description=open('README.md').read(),
    long_description_content_type='text/markdown',
@ -45,11 +45,19 @@ setup(
        'torch>=1.4.0',
    ],
    extras_require={
+        'dev': [
+            'Sphinx>=1.7.1',
+            'sphinx_rtd_theme',
+            'sphinxcontrib-bibtex>=0.3.6',
+            'flake8',
+            'pytest',
+            'pytest-cov',
+        ],
        'atari': [
            'atari_py',
        ],
        'mujoco': [
            'mujoco_py',
-        ]
+        ],
    },
 )
--- a/tianshou/init.py
+++ b/tianshou/init.py
@ -1,7 +1,7 @@
 from tianshou import data, env, utils, policy, trainer, \
    exploration

-__version__ = '0.2.0'
+__version__ = '0.2.0post1'
 __all__ = [
    'env',
    'data',