Update README.md

Sub-module function of tianshou.
2017-12-04 16:20:45 +08:00 · 2017-12-04 16:20:45 +08:00 · 674ba4656b
commit 674ba4656b
parent 543d876f12
1 changed files with 23 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform.
 ![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")
-## data
+## agent
-TODO:
+&nbsp;&nbsp;&nbsp;&nbsp;Examples
-
+&nbsp;&nbsp;&nbsp;&nbsp;Self-play Framework
 Replay Memory
 Multiple wirter/reader
 Importance sampling
 ## simulator
 go(for AlphaGo)
 ## environment
 gym
 ## core
 TODO:
-Optimizer
+### Model
 &nbsp;&nbsp;&nbsp;&nbsp;DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
-MCTS
+### Algorithm
-## agent (optional)
+#### Loss design
 &nbsp;&nbsp;&nbsp;&nbsp;Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
-DQNAgent etc.
+#### Optimization method
 &nbsp;&nbsp;&nbsp;&nbsp;SGD, ADAM, TRPO, natural gradient, etc.
-## Pontential Bugs:
+### Planning
 &nbsp;&nbsp;&nbsp;&nbsp;MCTS
-0. Wrong calculation of eval value
+## data
 &nbsp;&nbsp;&nbsp;&nbsp;Training style - Monte Carlo or Temporal Difference
-UCTNode.cpp
+&nbsp;&nbsp;&nbsp;&nbsp;Reward Reshaping/ Advantage Estimation Function
 ```
 106     if (to_move == FastBoard::WHITE) {
 107         net_eval = 1.0f - net_eval;
 108     }
-309         if (tomove == FastBoard::WHITE) {
+&nbsp;&nbsp;&nbsp;&nbsp;Importance weight
 310             score = 1.0f - score;
 311         }
 ```
-1. create children only on leaf node
+&nbsp;&nbsp;&nbsp;&nbsp;Multithread Read/Write
 UCTSearch.cpp
 ```
 60     if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
 61         float eval;
 62         auto success = node->create_children(m_nodes, currstate, eval);
 63         if (success) {
 64             result = SearchResult(eval);
 65         }
 66     }
 ```
 ## environment
 &nbsp;&nbsp;&nbsp;&nbsp;DQN repeat frames etc.
 ## simulator
 &nbsp;&nbsp;&nbsp;&nbsp;Go, Othello/Reversi, Warzone
 ## TODO
 Search based method parallel.