2017-11-04 01:38:59 +08:00
|
|
|
# tianshou
|
2017-12-04 16:39:35 +08:00
|
|
|
Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.
|
2017-11-06 15:58:21 +08:00
|
|
|
|
2017-12-04 16:39:35 +08:00
|
|
|
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png" height="200"/>
|
2017-11-06 15:58:21 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## agent
|
|
|
|
Examples
|
2017-12-04 16:21:33 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Self-play Framework
|
2017-11-06 15:17:55 +08:00
|
|
|
|
|
|
|
## core
|
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Model
|
|
|
|
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
2017-11-06 15:17:55 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Algorithm
|
2017-11-06 15:17:55 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
#### Loss design
|
|
|
|
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
#### Optimization method
|
|
|
|
SGD, ADAM, TRPO, natural gradient, etc.
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Planning
|
|
|
|
MCTS
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## data
|
|
|
|
Training style - Monte Carlo or Temporal Difference
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Reward Reshaping/ Advantage Estimation Function
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Importance weight
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Multithread Read/Write
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## environment
|
|
|
|
DQN repeat frames etc.
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## simulator
|
|
|
|
Go, Othello/Reversi, Warzone
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:39:35 +08:00
|
|
|
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/go.png" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/reversi.jpg" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/warzone.jpg" height="150"/>
|
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## TODO
|
|
|
|
Search based method parallel.
|