Update README.md
Sub-module function of tianshou.
This commit is contained in:
parent
543d876f12
commit
674ba4656b
65
README.md
65
README.md
@ -3,59 +3,40 @@ Tianshou(天授) is a reinforcement learning platform.
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
## data
|
## agent
|
||||||
TODO:
|
Examples
|
||||||
|
Self-play Framework
|
||||||
Replay Memory
|
|
||||||
|
|
||||||
Multiple wirter/reader
|
|
||||||
|
|
||||||
Importance sampling
|
|
||||||
|
|
||||||
## simulator
|
|
||||||
go(for AlphaGo)
|
|
||||||
|
|
||||||
## environment
|
|
||||||
gym
|
|
||||||
|
|
||||||
## core
|
## core
|
||||||
TODO:
|
|
||||||
|
|
||||||
Optimizer
|
### Model
|
||||||
|
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
||||||
|
|
||||||
MCTS
|
### Algorithm
|
||||||
|
|
||||||
## agent (optional)
|
#### Loss design
|
||||||
|
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
||||||
|
|
||||||
DQNAgent etc.
|
#### Optimization method
|
||||||
|
SGD, ADAM, TRPO, natural gradient, etc.
|
||||||
|
|
||||||
## Pontential Bugs:
|
### Planning
|
||||||
|
MCTS
|
||||||
|
|
||||||
0. Wrong calculation of eval value
|
## data
|
||||||
|
Training style - Monte Carlo or Temporal Difference
|
||||||
|
|
||||||
UCTNode.cpp
|
Reward Reshaping/ Advantage Estimation Function
|
||||||
```
|
|
||||||
106 if (to_move == FastBoard::WHITE) {
|
|
||||||
107 net_eval = 1.0f - net_eval;
|
|
||||||
108 }
|
|
||||||
|
|
||||||
309 if (tomove == FastBoard::WHITE) {
|
Importance weight
|
||||||
310 score = 1.0f - score;
|
|
||||||
311 }
|
|
||||||
```
|
|
||||||
|
|
||||||
1. create children only on leaf node
|
Multithread Read/Write
|
||||||
|
|
||||||
UCTSearch.cpp
|
|
||||||
```
|
|
||||||
60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
|
|
||||||
61 float eval;
|
|
||||||
62 auto success = node->create_children(m_nodes, currstate, eval);
|
|
||||||
63 if (success) {
|
|
||||||
64 result = SearchResult(eval);
|
|
||||||
65 }
|
|
||||||
66 }
|
|
||||||
```
|
|
||||||
|
|
||||||
|
## environment
|
||||||
|
DQN repeat frames etc.
|
||||||
|
|
||||||
|
## simulator
|
||||||
|
Go, Othello/Reversi, Warzone
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
Search based method parallel.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user