Tianshou

Author	SHA1	Message	Date
haoshengzou	8fbde8283f	finish dqn example. advantage estimation module is not complete yet.	2018-01-18 12:19:48 +08:00
haoshengzou	dfcea74fcf	fix memory growth and slowness caused by sess.run(tf.multinomial()), now ppo examples are working OK with slight memory growth (1M/min), which still needs research	2018-01-03 20:32:05 +08:00
haoshengzou	b33a141373	towards policy/value refactor	2017-12-23 17:25:16 +08:00
haoshengzou	2addef41d2	fix imports to support both python2 and python3. move contents from __init__.py to leave for work after major development.	2017-12-23 15:36:10 +08:00
Haosheng Zou	6611d948dd	add value_function (critic). value_function and policy not finished yet.	2017-12-22 00:22:23 +08:00
宋世虹	d220f7f2a8	add comments and todos	2017-12-17 13:28:21 +08:00
宋世虹	3624cc9036	finished very naive dqn: changed the interface of replay buffer by adding collect and next_batch, but still need refactoring; added implementation of dqn.py, but still need to consider the interface to make it more extensive; slightly refactored the code style of the codebase; more comments and todos will be in the next commit	2017-12-17 12:52:00 +08:00
Haosheng Zou	039c8140e2	add dqn.py to write	2017-12-13 22:43:45 +08:00