Tianshou

Author	SHA1	Message	Date
Wenbo Hu	1e2567c174	fixing bug pass parameterg	2017-12-21 19:31:51 +08:00
Wenbo Hu	40909fa994	forbid pass, if we have other choices	2017-12-20 22:10:47 +08:00
Wenbo Hu	0ab38743aa	minor revision.	2017-12-20 21:52:30 +08:00
Wenbo Hu	818da800e2	simulator process a valid set, instead of a single action	2017-12-20 21:35:35 +08:00
rtz19970824	112fd07b13	modify the mcts, refactor the network	2017-12-20 16:43:42 +08:00
Dong Yan	f8a70183b6	fix the copy bug in check_global_isomorphous; refactor code to eliminate side effect	2017-12-19 22:57:38 +08:00
Dong Yan	83f9e19fa5	merge flatten and deflatten, rename variable for clarity	2017-12-19 16:51:50 +08:00
宋世虹	d220f7f2a8	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	3624cc9036	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Dong Yan	31199c7d0d	0. code refactor, try to merge Go and GoEnv	2017-12-16 23:29:11 +08:00
Dong Yan	4fc50c5f1b	merge class strategy with class game. Next, merge Go with GoEnv	2017-12-15 22:19:44 +08:00
rtz19970824	e5bf7a9270	implement dqn loss and dpg loss, add TODO for separate actor and critic	2017-12-15 14:24:08 +08:00
Haosheng Zou	039c8140e2	add dqn.py to write	2017-12-13 22:43:45 +08:00
Haosheng Zou	7ab211b63c	preliminary design of dqn_example, dqn interface. identify the assign of networks	2017-12-13 20:47:45 +08:00
rtz19970824	0c4a83f3eb	vanilla policy gradient	2017-12-11 13:37:27 +08:00
haosheng	a00b930c2c	fix naming and comments of coding style, delete .json	2017-12-10 17:23:13 +08:00
songshshshsh	f1a7fd9ee1	replay buffer initial commit	2017-12-10 14:56:04 +08:00
rtz19970824	a8a12f1083	coding style	2017-12-10 14:23:40 +08:00
rtz19970824	18b3b0b850	add some TODO	2017-12-10 13:31:43 +08:00
rtz19970824	03a6880050	Merge branch 'master' of https://github.com/sproblvem/tianshou	2017-12-08 23:41:51 +08:00
rtz19970824	bc49d466d1	minor fixed	2017-12-08 23:41:31 +08:00
haosheng	ff4306ddb9	model-free rl first commit, with ppo_example.py in examples/ and task delegations in ppo_example.py and READMEs	2017-12-08 21:09:23 +08:00
rtz19970824	f9f63e6609	combine gtp and network	2017-12-05 23:17:20 +08:00
rtz19970824	543d876f12	merge gtp	2017-12-04 11:01:49 +08:00
rtz19970824	7a4c5c3c88	minor fixed	2017-12-03 19:16:21 +08:00
rtz19970824	ca0021083f	AlphaGo update	2017-11-26 13:36:52 +08:00
rtz19970824	e4e56d17d1	minor fixed	2017-11-21 22:52:17 +08:00
rtz19970824	31beb46563	mcts update	2017-11-21 22:19:52 +08:00
JialianLee	1e07cb1fac	modification of docs for mcts	2017-11-18 15:55:14 +08:00
JialianLee	3795c24be9	Merge branch 'master' of github.com:sproblvem/tianshou	2017-11-18 15:50:54 +08:00
JialianLee	d9a50569f5	modification to docs of mcts	2017-11-18 09:37:15 +08:00
Dong Yan	31bfc07dc2	mcts update	2017-11-17 19:35:20 +08:00
Tongzheng Ren	c5c2cdf0f3	mcts update	2017-11-17 15:09:07 +08:00
Dong Yan	767fd4ea20	mcts	2017-11-16 17:05:54 +08:00
Dong Yan	df57fdb411	mcts framework	2017-11-16 13:23:26 +08:00
Dong Yan	30427055d1	mcts framework	2017-11-16 13:21:27 +08:00
JialianLee	2f1035d899	update mcts docs	2017-11-16 12:38:51 +08:00
Tongzheng Ren	6d9c369a65	architecture design patch two	2017-11-06 15:24:34 +08:00
Tongzheng Ren	e6cad0bce9	architecture design patch	2017-11-06 15:17:55 +08:00
Tongzheng Ren	595e62e111	architecture design	2017-11-06 15:15:44 +08:00
Tongzheng Ren	4e4a7b74c1	update the optimizer README	2017-11-06 14:01:29 +08:00
Tongzheng Ren	48b830eda6	TODO: policy optimizer	2017-11-06 13:50:35 +08:00

42 Commits