Update README.md
Sub-module function of tianshou.
This commit is contained in:
parent
543d876f12
commit
674ba4656b
65
README.md
65
README.md
@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform.
|
||||
|
||||

|
||||
|
||||
## data
|
||||
TODO:
|
||||
|
||||
Replay Memory
|
||||
|
||||
Multiple wirter/reader
|
||||
|
||||
Importance sampling
|
||||
|
||||
## simulator
|
||||
go(for AlphaGo)
|
||||
|
||||
## environment
|
||||
gym
|
||||
## agent
|
||||
Examples
|
||||
Self-play Framework
|
||||
|
||||
## core
|
||||
TODO:
|
||||
|
||||
Optimizer
|
||||
### Model
|
||||
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
||||
|
||||
MCTS
|
||||
### Algorithm
|
||||
|
||||
## agent (optional)
|
||||
#### Loss design
|
||||
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
||||
|
||||
DQNAgent etc.
|
||||
#### Optimization method
|
||||
SGD, ADAM, TRPO, natural gradient, etc.
|
||||
|
||||
## Pontential Bugs:
|
||||
### Planning
|
||||
MCTS
|
||||
|
||||
0. Wrong calculation of eval value
|
||||
## data
|
||||
Training style - Monte Carlo or Temporal Difference
|
||||
|
||||
UCTNode.cpp
|
||||
```
|
||||
106 if (to_move == FastBoard::WHITE) {
|
||||
107 net_eval = 1.0f - net_eval;
|
||||
108 }
|
||||
Reward Reshaping/ Advantage Estimation Function
|
||||
|
||||
309 if (tomove == FastBoard::WHITE) {
|
||||
310 score = 1.0f - score;
|
||||
311 }
|
||||
```
|
||||
Importance weight
|
||||
|
||||
1. create children only on leaf node
|
||||
|
||||
UCTSearch.cpp
|
||||
```
|
||||
60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
|
||||
61 float eval;
|
||||
62 auto success = node->create_children(m_nodes, currstate, eval);
|
||||
63 if (success) {
|
||||
64 result = SearchResult(eval);
|
||||
65 }
|
||||
66 }
|
||||
```
|
||||
Multithread Read/Write
|
||||
|
||||
## environment
|
||||
DQN repeat frames etc.
|
||||
|
||||
## simulator
|
||||
Go, Othello/Reversi, Warzone
|
||||
|
||||
## TODO
|
||||
Search based method parallel.
|
||||
|
Loading…
x
Reference in New Issue
Block a user