tianshou arXiv tanh lr logits env envs optim eps timelimit TimeLimit maxsize timestep numpy ndarray stackoverflow len tac fqf iqn qrdqn rl quantile quantiles dqn param async subprocess nn equ cql fn boolean pre np rnn rew pre perceptron bsz dataset mujoco jit nstep preprocess repo ReLU namespace th utils NaN linesearch hyperparameters pseudocode entropies nn config cpu rms debias indice regularizer miniblock modularize serializable softmax vectorized optimizers undiscounted submodule subclasses submodules tfevent dirichlet docstring webpage formatter num py pythonic 中文文档位于 conda miniconda Amir Andreas Antonoglou Beattie Bellemare Charles Daan Demis Dharshan Fidjeland Georg Hassabis Helen Ioannis Kavukcuoglu King Koray Kumaran Legg Mnih Ostrovski Petersen Riedmiller Rusu Sadik Shane Stig Veness Volodymyr Wierstra Lillicrap Pritzel Heess Erez Yuval Tassa Schulman Filip Wolski Prafulla Dhariwal Radford Oleg Klimov Kaichao Jiayi Weng Duburcq Huayu Strens Ornstein Uhlenbeck