Merge branch 'master' of https://github.com/sproblvem/tianshou
This commit is contained in:
commit
e9beef46e4
68
README.md
68
README.md
@ -1,61 +1,45 @@
|
||||
# tianshou
|
||||
Tianshou(天授) is a reinforcement learning platform.
|
||||
Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.
|
||||
|
||||

|
||||
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png" height="200"/>
|
||||
|
||||
## data
|
||||
TODO:
|
||||
## agent
|
||||
Examples
|
||||
|
||||
Replay Memory
|
||||
|
||||
Multiple wirter/reader
|
||||
|
||||
Importance sampling
|
||||
|
||||
## simulator
|
||||
go(for AlphaGo)
|
||||
|
||||
## environment
|
||||
gym
|
||||
Self-play Framework
|
||||
|
||||
## core
|
||||
TODO:
|
||||
|
||||
Optimizer
|
||||
### Model
|
||||
DQN, Policy-Value Network of AlphaGo Zero, PPO-specific, TROP-specific
|
||||
|
||||
MCTS
|
||||
### Algorithm
|
||||
|
||||
## agent (optional)
|
||||
#### Loss design
|
||||
Actor-Critic (Variations), DQN (Variations), DDPG, TRPO, PPO
|
||||
|
||||
DQNAgent etc.
|
||||
#### Optimization method
|
||||
SGD, ADAM, TRPO, natural gradient, etc.
|
||||
|
||||
## Pontential Bugs:
|
||||
### Planning
|
||||
MCTS
|
||||
|
||||
0. Wrong calculation of eval value
|
||||
## data
|
||||
Training style - Monte Carlo or Temporal Difference
|
||||
|
||||
UCTNode.cpp
|
||||
```
|
||||
106 if (to_move == FastBoard::WHITE) {
|
||||
107 net_eval = 1.0f - net_eval;
|
||||
108 }
|
||||
Reward Reshaping/ Advantage Estimation Function
|
||||
|
||||
309 if (tomove == FastBoard::WHITE) {
|
||||
310 score = 1.0f - score;
|
||||
311 }
|
||||
```
|
||||
Importance weight
|
||||
|
||||
1. create children only on leaf node
|
||||
Multithread Read/Write
|
||||
|
||||
UCTSearch.cpp
|
||||
```
|
||||
60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
|
||||
61 float eval;
|
||||
62 auto success = node->create_children(m_nodes, currstate, eval);
|
||||
63 if (success) {
|
||||
64 result = SearchResult(eval);
|
||||
65 }
|
||||
66 }
|
||||
```
|
||||
## environment
|
||||
DQN repeat frames etc.
|
||||
|
||||
## simulator
|
||||
Go, Othello/Reversi, Warzone
|
||||
|
||||
<img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/go.png" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/reversi.jpg" height="150"/> <img src="https://github.com/sproblvem/tianshou/blob/master/docs/figures/warzone.jpg" height="150"/>
|
||||
|
||||
## TODO
|
||||
Search based method parallel.
|
||||
|
BIN
docs/figures/go.png
Normal file
BIN
docs/figures/go.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 3.2 KiB |
BIN
docs/figures/reversi.jpg
Normal file
BIN
docs/figures/reversi.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 57 KiB |
BIN
docs/figures/warzone.jpg
Normal file
BIN
docs/figures/warzone.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 162 KiB |
Loading…
x
Reference in New Issue
Block a user