2017-11-04 01:38:59 +08:00
|
|
|
# tianshou
|
|
|
|
Tianshou(天授) is a reinforcement learning platform.
|
2017-11-06 15:58:21 +08:00
|
|
|
|
|
|
|

|
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## agent
|
|
|
|
Examples
|
|
|
|
Self-play Framework
|
2017-11-06 15:17:55 +08:00
|
|
|
|
|
|
|
## core
|
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Model
|
|
|
|
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
2017-11-06 15:17:55 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Algorithm
|
2017-11-06 15:17:55 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
#### Loss design
|
|
|
|
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
#### Optimization method
|
|
|
|
SGD, ADAM, TRPO, natural gradient, etc.
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
### Planning
|
|
|
|
MCTS
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## data
|
|
|
|
Training style - Monte Carlo or Temporal Difference
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Reward Reshaping/ Advantage Estimation Function
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Importance weight
|
2017-11-06 20:39:09 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
Multithread Read/Write
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## environment
|
|
|
|
DQN repeat frames etc.
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## simulator
|
|
|
|
Go, Othello/Reversi, Warzone
|
2017-11-06 20:35:53 +08:00
|
|
|
|
2017-12-04 16:20:45 +08:00
|
|
|
## TODO
|
|
|
|
Search based method parallel.
|