46 lines
1.4 KiB
Markdown
46 lines
1.4 KiB
Markdown
# tianshou
|
|
Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.
|
|
|
|
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png" height="200"/>
|
|
|
|
## agent
|
|
Examples
|
|
|
|
Self-play Framework
|
|
|
|
## core
|
|
|
|
### Model
|
|
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
|
|
|
### Algorithm
|
|
|
|
#### Loss design
|
|
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
|
|
|
#### Optimization method
|
|
SGD, ADAM, TRPO, natural gradient, etc.
|
|
|
|
### Planning
|
|
MCTS
|
|
|
|
## data
|
|
Training style - Monte Carlo or Temporal Difference
|
|
|
|
Reward Reshaping/ Advantage Estimation Function
|
|
|
|
Importance weight
|
|
|
|
Multithread Read/Write
|
|
|
|
## environment
|
|
DQN repeat frames etc.
|
|
|
|
## simulator
|
|
Go, Othello/Reversi, Warzone
|
|
|
|
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/go.png" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/reversi.jpg" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/warzone.jpg" height="150"/>
|
|
|
|
## TODO
|
|
Search based method parallel.
|