Update README.md

Sub-module function of tianshou.
This commit is contained in:
sproblvem 2017-12-04 16:20:45 +08:00 committed by GitHub
parent 543d876f12
commit 674ba4656b

View File

@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform.
![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou") ![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")
## data ## agent
TODO:     Examples
    Self-play Framework
Replay Memory
Multiple wirter/reader
Importance sampling
## simulator
go(for AlphaGo)
## environment
gym
## core ## core
TODO:
Optimizer ### Model
    DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
MCTS ### Algorithm
## agent (optional) #### Loss design
    Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
DQNAgent etc. #### Optimization method
    SGD, ADAM, TRPO, natural gradient, etc.
## Pontential Bugs: ### Planning
    MCTS
0. Wrong calculation of eval value ## data
    Training style - Monte Carlo or Temporal Difference
UCTNode.cpp     Reward Reshaping/ Advantage Estimation Function
```
106 if (to_move == FastBoard::WHITE) {
107 net_eval = 1.0f - net_eval;
108 }
309 if (tomove == FastBoard::WHITE) {     Importance weight
310 score = 1.0f - score;
311 }
```
1. create children only on leaf node     Multithread Read/Write
UCTSearch.cpp
```
60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
61 float eval;
62 auto success = node->create_children(m_nodes, currstate, eval);
63 if (success) {
64 result = SearchResult(eval);
65 }
66 }
```
## environment
&nbsp;&nbsp;&nbsp;&nbsp;DQN repeat frames etc.
## simulator
&nbsp;&nbsp;&nbsp;&nbsp;Go, Othello/Reversi, Warzone
## TODO
Search based method parallel.