hongshaorou/Tianshou

Fork 0

Go to file

rtz19970824 f9f63e6609 combine gtp and network

2017-12-05 23:17:20 +08:00

AlphaGo

combine gtp and network

2017-12-05 23:17:20 +08:00

docs/figures

upload the architecture image

2017-11-06 15:56:16 +08:00

examples

architecture design patch two

2017-11-06 15:24:34 +08:00

tianshou

combine gtp and network

2017-12-05 23:17:20 +08:00

utils

remove .swp

2017-11-28 15:04:00 +08:00

__init__.py

add __init__.py

2017-12-01 01:38:11 +08:00

.gitignore

remove .swp

2017-11-28 15:04:00 +08:00

LICENSE

Initial commit

2017-11-04 01:38:59 +08:00

README.md

Update README.md

2017-11-06 20:39:09 +08:00

README.md

tianshou

Tianshou(天授) is a reinforcement learning platform.

data

TODO:

Replay Memory

Multiple wirter/reader

Importance sampling

simulator

go(for AlphaGo)

environment

gym

core

TODO:

Optimizer

MCTS

agent (optional)

DQNAgent etc.

Pontential Bugs:

Wrong calculation of eval value

UCTNode.cpp

106     if (to_move == FastBoard::WHITE) {
107         net_eval = 1.0f - net_eval;
108     }

309         if (tomove == FastBoard::WHITE) {
310             score = 1.0f - score;
311         }

create children only on leaf node

UCTSearch.cpp

 60     if (!node->has_children() && m_nodes < MAX_TREE_SIZE) {
 61         float eval;
 62         auto success = node->create_children(m_nodes, currstate, eval);
 63         if (success) {
 64             result = SearchResult(eval);
 65         }
 66     }