Tianshou/README.md

# tianshou
Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.

<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png" height="200"/>

## agent
&nbsp;&nbsp;&nbsp;&nbsp;Examples

&nbsp;&nbsp;&nbsp;&nbsp;Self-play Framework

## core

### Model
&nbsp;&nbsp;&nbsp;&nbsp;DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific

### Algorithm

#### Loss design
&nbsp;&nbsp;&nbsp;&nbsp;Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO

#### Optimization method
&nbsp;&nbsp;&nbsp;&nbsp;SGD, ADAM, TRPO, natural gradient, etc.

### Planning
&nbsp;&nbsp;&nbsp;&nbsp;MCTS

## data
&nbsp;&nbsp;&nbsp;&nbsp;Training style - Monte Carlo or Temporal Difference

&nbsp;&nbsp;&nbsp;&nbsp;Reward Reshaping/ Advantage Estimation Function

&nbsp;&nbsp;&nbsp;&nbsp;Importance weight

&nbsp;&nbsp;&nbsp;&nbsp;Multithread Read/Write

## environment
&nbsp;&nbsp;&nbsp;&nbsp;DQN repeat frames etc.

## simulator
&nbsp;&nbsp;&nbsp;&nbsp;Go, Othello/Reversi, Warzone

<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/go.png" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/reversi.jpg" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/warzone.jpg" height="150"/>

## TODO
Search based method parallel.
Initial commit 2017-11-04 01:38:59 +08:00			`# tianshou`
Update README.md add the illustrate pictures 2017-12-04 16:39:35 +08:00			`Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.`
Update README.md add the arch image to readme 2017-11-06 15:58:21 +08:00
Update README.md add the illustrate pictures 2017-12-04 16:39:35 +08:00			`<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png" height="200"/>`
Update README.md add the arch image to readme 2017-11-06 15:58:21 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`## agent`
			`    Examples`
Update README.md 2017-12-04 16:21:33 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`    Self-play Framework`
architecture design patch 2017-11-06 15:17:55 +08:00
			`## core`

Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`### Model`
			`    DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific`
architecture design patch 2017-11-06 15:17:55 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`### Algorithm`
architecture design patch 2017-11-06 15:17:55 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`#### Loss design`
			`    Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`#### Optimization method`
			`    SGD, ADAM, TRPO, natural gradient, etc.`
Update README.md format modify 2017-11-06 20:39:09 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`### Planning`
			`    MCTS`
Update README.md format modify 2017-11-06 20:39:09 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`## data`
			`    Training style - Monte Carlo or Temporal Difference`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`    Reward Reshaping/ Advantage Estimation Function`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`    Importance weight`
Update README.md format modify 2017-11-06 20:39:09 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`    Multithread Read/Write`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`## environment`
			`    DQN repeat frames etc.`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`## simulator`
			`    Go, Othello/Reversi, Warzone`
Update README.md add potential bugs of leela. 2017-11-06 20:35:53 +08:00
Update README.md add the illustrate pictures 2017-12-04 16:39:35 +08:00			`<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/go.png" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/reversi.jpg" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/warzone.jpg" height="150"/>`

Update README.md Sub-module function of tianshou. 2017-12-04 16:20:45 +08:00			`## TODO`
			`Search based method parallel.`