diff --git a/README.md b/README.md index 3bf198f..da14a95 100644 --- a/README.md +++ b/README.md @@ -1,61 +1,45 @@ # tianshou -Tianshou(天授) is a reinforcement learning platform. +Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture. -![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou") + -## data -TODO: +## agent +    Examples -Replay Memory - -Multiple wirter/reader - -Importance sampling - -## simulator -go(for AlphaGo) - -## environment -gym +    Self-play Framework ## core -TODO: -Optimizer +### Model +    DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific -MCTS +### Algorithm -## agent (optional) +#### Loss design +    Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO -DQNAgent etc. +#### Optimization method +    SGD, ADAM, TRPO, natural gradient, etc. -## Pontential Bugs: +### Planning +    MCTS -0. Wrong calculation of eval value +## data +    Training style - Monte Carlo or Temporal Difference -UCTNode.cpp -``` -106 if (to_move == FastBoard::WHITE) { -107 net_eval = 1.0f - net_eval; -108 } +    Reward Reshaping/ Advantage Estimation Function -309 if (tomove == FastBoard::WHITE) { -310 score = 1.0f - score; -311 } -``` +    Importance weight -1. create children only on leaf node +    Multithread Read/Write -UCTSearch.cpp -``` - 60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) { - 61 float eval; - 62 auto success = node->create_children(m_nodes, currstate, eval); - 63 if (success) { - 64 result = SearchResult(eval); - 65 } - 66 } -``` +## environment +    DQN repeat frames etc. +## simulator +    Go, Othello/Reversi, Warzone + +## TODO +Search based method parallel. diff --git a/docs/figures/go.png b/docs/figures/go.png new file mode 100644 index 0000000..4f5e56f Binary files /dev/null and b/docs/figures/go.png differ diff --git a/docs/figures/reversi.jpg b/docs/figures/reversi.jpg new file mode 100644 index 0000000..5939d50 Binary files /dev/null and b/docs/figures/reversi.jpg differ diff --git a/docs/figures/warzone.jpg b/docs/figures/warzone.jpg new file mode 100644 index 0000000..8ee120b Binary files /dev/null and b/docs/figures/warzone.jpg differ