# tianshou
Tianshou(天授) is a reinforcement learning platform.

![alt text](https://github.com/sproblvem/tianshou/blob/master/docs/figures/tianshou_architecture.png "Architecture of tianshou")

## data
TODO:

Replay Memory

Multiple wirter/reader

Importance sampling

## simulator
go(for AlphaGo)

## environment
gym

## core
TODO:

Optimizer

MCTS

## agent (optional)

DQNAgent etc.

## Pontential Bugs:

0. Wrong calculation of eval value

UCTNode.cpp
```
106     if (to_move == FastBoard::WHITE) {
107         net_eval = 1.0f - net_eval;
108     }

309         if (tomove == FastBoard::WHITE) {
310             score = 1.0f - score;
311         }
```

1. create children only on leaf node

UCTSearch.cpp
```
 60     if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
 61         float eval;
 62         auto success = node->create_children(m_nodes, currstate, eval);
 63         if (success) {
 64             result = SearchResult(eval);
 65         }
 66     }
```