Tianshou/README.md

44 lines
1.0 KiB
Markdown
Raw Normal View History

2017-11-04 01:38:59 +08:00
# tianshou
Tianshou(天授) is a reinforcement learning platform.
![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")
## agent
    Examples
2017-12-04 16:21:33 +08:00
    Self-play Framework
2017-11-06 15:17:55 +08:00
## core
### Model
    DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
2017-11-06 15:17:55 +08:00
### Algorithm
2017-11-06 15:17:55 +08:00
#### Loss design
    Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
#### Optimization method
    SGD, ADAM, TRPO, natural gradient, etc.
2017-11-06 20:39:09 +08:00
### Planning
    MCTS
2017-11-06 20:39:09 +08:00
## data
    Training style - Monte Carlo or Temporal Difference
    Reward Reshaping/ Advantage Estimation Function
    Importance weight
2017-11-06 20:39:09 +08:00
    Multithread Read/Write
## environment
    DQN repeat frames etc.
## simulator
    Go, Othello/Reversi, Warzone
## TODO
Search based method parallel.