Tianshou

Author	SHA1	Message	Date
haoshengzou	e68dcd3c64	working on off-policy test. other parts of dqn_replay is runnable, but performance not tested.	2018-03-08 16:51:12 +08:00
haoshengzou	2a2274aeea	initial data_collector. working on examples/dqn_replay.py to run	2018-03-04 21:29:58 +08:00
Dong Yan	0bc1b63e38	add epsilon-greedy for dqn	2018-02-25 16:31:35 +08:00
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	b33a141373	towards policy/value refactor	2017-12-23 17:25:16 +08:00
haoshengzou	2addef41d2	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
Haosheng Zou	6611d948dd	add value_function (critic). value_function and policy not finished yet.	2017-12-22 00:22:23 +08:00
宋世虹	d220f7f2a8	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	3624cc9036	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Haosheng Zou	039c8140e2	add dqn.py to write	2017-12-13 22:43:45 +08:00

11 Commits