Update README.md

Sub-module function of tianshou.
2017-12-04 16:20:45 +08:00 · 2017-12-04 16:20:45 +08:00 · 674ba4656b
commit 674ba4656b
parent 543d876f12
1 changed files with 23 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform.

 ![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")

-## data
-TODO:
-
-Replay Memory
-
-Multiple wirter/reader
-
-Importance sampling
-
-## simulator
-go(for AlphaGo)
-
-## environment
-gym
+## agent
+&nbsp;&nbsp;&nbsp;&nbsp;Examples
+&nbsp;&nbsp;&nbsp;&nbsp;Self-play Framework

 ## core
-TODO:

-Optimizer
+### Model
+&nbsp;&nbsp;&nbsp;&nbsp;DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific

-MCTS
+### Algorithm

-## agent (optional)
+#### Loss design
+&nbsp;&nbsp;&nbsp;&nbsp;Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO

-DQNAgent etc.
+#### Optimization method
+&nbsp;&nbsp;&nbsp;&nbsp;SGD, ADAM, TRPO, natural gradient, etc.

-## Pontential Bugs:
+### Planning
+&nbsp;&nbsp;&nbsp;&nbsp;MCTS

-0. Wrong calculation of eval value
+## data
+&nbsp;&nbsp;&nbsp;&nbsp;Training style - Monte Carlo or Temporal Difference

-UCTNode.cpp
-```
-106     if (to_move == FastBoard::WHITE) {
-107         net_eval = 1.0f - net_eval;
-108     }
+&nbsp;&nbsp;&nbsp;&nbsp;Reward Reshaping/ Advantage Estimation Function

-309         if (tomove == FastBoard::WHITE) {
-310             score = 1.0f - score;
-311         }
-```
+&nbsp;&nbsp;&nbsp;&nbsp;Importance weight

-1. create children only on leaf node
-
-UCTSearch.cpp
-```
- 60     if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
- 61         float eval;
- 62         auto success = node->create_children(m_nodes, currstate, eval);
- 63         if (success) {
- 64             result = SearchResult(eval);
- 65         }
- 66     }
-```
+&nbsp;&nbsp;&nbsp;&nbsp;Multithread Read/Write

+## environment
+&nbsp;&nbsp;&nbsp;&nbsp;DQN repeat frames etc.

+## simulator
+&nbsp;&nbsp;&nbsp;&nbsp;Go, Othello/Reversi, Warzone

+## TODO
+Search based method parallel.