diff --git a/README.md b/README.md index 3bf198f..68d4d70 100644 --- a/README.md +++ b/README.md @@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform. ![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou") -## data -TODO: - -Replay Memory - -Multiple wirter/reader - -Importance sampling - -## simulator -go(for AlphaGo) - -## environment -gym +## agent +    Examples +    Self-play Framework ## core -TODO: -Optimizer +### Model +    DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific -MCTS +### Algorithm -## agent (optional) +#### Loss design +    Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO -DQNAgent etc. +#### Optimization method +    SGD, ADAM, TRPO, natural gradient, etc. -## Pontential Bugs: +### Planning +    MCTS -0. Wrong calculation of eval value +## data +    Training style - Monte Carlo or Temporal Difference -UCTNode.cpp -``` -106 if (to_move == FastBoard::WHITE) { -107 net_eval = 1.0f - net_eval; -108 } +    Reward Reshaping/ Advantage Estimation Function -309 if (tomove == FastBoard::WHITE) { -310 score = 1.0f - score; -311 } -``` +    Importance weight -1. create children only on leaf node - -UCTSearch.cpp -``` - 60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) { - 61 float eval; - 62 auto success = node->create_children(m_nodes, currstate, eval); - 63 if (success) { - 64 result = SearchResult(eval); - 65 } - 66 } -``` +    Multithread Read/Write +## environment +    DQN repeat frames etc. +## simulator +    Go, Othello/Reversi, Warzone +## TODO +Search based method parallel.