Tianshou

History

Add an indicator(i.e. `self.learning`) of learning will be convenient for distinguishing state of policy.
Meanwhile, the state of `self.training` will be undisputed in the training stage.
Related issue: #211 

Others:
- fix a bug in DDQN: target_q could not be sampled from np.random.rand
- fix a bug in DQN atari net: it should add a ReLU before the last layer
- fix a bug in collector timing

Co-authored-by: n+e <463003665@qq.com>

2020-09-22 16:28:46 +08:00

batch.rst

fix docs and add docstring check (#210 )

2020-09-11 07:55:37 +08:00

cheatsheet.rst

fix docs and add docstring check (#210 )

2020-09-11 07:55:37 +08:00

concepts.rst

clarify updating state (#224 )

2020-09-22 16:28:46 +08:00

dqn.rst

fix docs and add docstring check (#210 )