# tianshou Tianshou(天授) is a reinforcement learning platform. ![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou") ## agent     Examples     Self-play Framework ## core ### Model     DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific ### Algorithm #### Loss design     Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO #### Optimization method     SGD, ADAM, TRPO, natural gradient, etc. ### Planning     MCTS ## data     Training style - Monte Carlo or Temporal Difference     Reward Reshaping/ Advantage Estimation Function     Importance weight     Multithread Read/Write ## environment     DQN repeat frames etc. ## simulator     Go, Othello/Reversi, Warzone ## TODO Search based method parallel.