Tianshou/README.md

62 lines
983 B
Markdown
Raw Normal View History

2017-11-04 01:38:59 +08:00
# tianshou
Tianshou(天授) is a reinforcement learning platform.
![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")
2017-11-06 15:17:55 +08:00
## data
TODO:
Replay Memory
Multiple wirter/reader
Importance sampling
## simulator
go(for AlphaGo)
## environment
gym
## core
TODO:
Optimizer
MCTS
## agent (optional)
DQNAgent etc.
2017-11-06 20:39:09 +08:00
## Pontential Bugs:
0. Wrong calculation of eval value
2017-11-06 20:39:09 +08:00
UCTNode.cpp
2017-11-06 20:39:09 +08:00
```
106 if (to_move == FastBoard::WHITE) {
107 net_eval = 1.0f - net_eval;
108 }
309 if (tomove == FastBoard::WHITE) {
310 score = 1.0f - score;
311 }
2017-11-06 20:39:09 +08:00
```
1. create children only on leaf node
2017-11-06 20:39:09 +08:00
UCTSearch.cpp
2017-11-06 20:39:09 +08:00
```
60 if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
61 float eval;
62 auto success = node->create_children(m_nodes, currstate, eval);
63 if (success) {
64 result = SearchResult(eval);
65 }
66 }
2017-11-06 20:39:09 +08:00
```