2017-12-10 14:37:29 +08:00
2017-12-09 21:41:11 +08:00
2017-12-10 14:23:40 +08:00
2017-11-26 13:36:52 +08:00
2017-12-01 01:38:11 +08:00
2017-12-09 21:53:12 +08:00
2017-11-04 01:38:59 +08:00
2017-12-10 14:37:29 +08:00

tianshou

Tianshou(天授) is a reinforcement learning platform. The following image illustrate its architecture.

agent

    Examples

    Self-play Framework

core

Policy Wrapper

    Stochastic policies (OnehotCategorical, Gaussian), deterministic policies (policy as in DQN, DDPG)

    Specific network architectures in original paper of DQN, TRPO, A3C, etc. Policy-Value Network of AlphaGo Zero

Algorithm

losses

    policy gradient (and its variants), DQN (and its variants), DDPG, TRPO, PPO

optimizer

    TRPO, natural gradient (and TensorFlow optimizers (sgd, adam))

Planning

    MCTS

data

    Training style - Batch, Replay (and its variants)

    Advantage Estimation Function

    Multithread Read/Write

environment

    DQN repeat frames, Reward Reshaping, image preprocessing (not sure where)

simulator

    Go, Othello/Reversi, Warzone

About coding style

You can follow google python coding style

The file should all be named with lower case letters and underline.

TODO

Search based method parallel.

YongRen: Policy Wrapper, in order of Gaussian, DQN and DDPG

TongzhengRen: losses, in order of ppo, pg, DQN, DDPG with management of placeholders

YouQiaoben: data/Batch, implement num_timesteps, fix memory growth in num_episodes; adv_estimate.gae_lambda (need to write a value network in tf)

ShihongSong: data/Replay; then adv_estimate.dqn after YongRen's DQN

HaoshengZou: collaborate mainly on Policy and losses; interfaces and architecture

Note: install openai/gym first to run the Atari environment; note that interfaces between modules may not be finalized; the management of placeholders and feed_dict may have to be done manually for the time being;

Without preprocessing and other tricks, this example will not train to any meaningful results. Codes should past two tests: individual module test and run through this example code.

Description
No description provided
Readme 46 MiB
Languages
Python 100%