Closes #947 This removes all kwargs from all policy constructors. While doing that, I also improved several names and added a whole lot of TODOs. ## Functional changes: 1. Added possibility to pass None as `critic2` and `critic2_optim`. In fact, the default behavior then should cover the absolute majority of cases 2. Added a function called `clone_optimizer` as a temporary measure to support passing `critic2_optim=None` ## Breaking changes: 1. `action_space` is no longer optional. In fact, it already was non-optional, as there was a ValueError in BasePolicy.init. So now several examples were fixed to reflect that 2. `reward_normalization` removed from DDPG and children. It was never allowed to pass it as `True` there, an error would have been raised in `compute_n_step_reward`. Now I removed it from the interface 3. renamed `critic1` and similar to `critic`, in order to have uniform interfaces. Note that the `critic` in DDPG was optional for the sole reason that child classes used `critic1`. I removed this optionality (DDPG can't do anything with `critic=None`) 4. Several renamings of fields (mostly private to public, so backwards compatible) ## Additional changes: 1. Removed type and default declaration from docstring. This kind of duplication is really not necessary 2. Policy constructors are now only called using named arguments, not a fragile mixture of positional and named as before 5. Minor beautifications in typing and code 6. Generally shortened docstrings and made them uniform across all policies (hopefully) ## Comment: With these changes, several problems in tianshou's inheritance hierarchy become more apparent. I tried highlighting them for future work. --------- Co-authored-by: Dominik Jain <d.jain@appliedai.de>
183 lines
1.3 KiB
Plaintext
183 lines
1.3 KiB
Plaintext
tianshou
|
|
arXiv
|
|
tanh
|
|
lr
|
|
logits
|
|
env
|
|
envs
|
|
optim
|
|
eps
|
|
timelimit
|
|
TimeLimit
|
|
envpool
|
|
EnvPool
|
|
maxsize
|
|
timestep
|
|
timesteps
|
|
numpy
|
|
ndarray
|
|
stackoverflow
|
|
tensorboard
|
|
state_dict
|
|
len
|
|
tac
|
|
fqf
|
|
iqn
|
|
qrdqn
|
|
rl
|
|
offpolicy
|
|
onpolicy
|
|
quantile
|
|
quantiles
|
|
dqn
|
|
param
|
|
async
|
|
subprocess
|
|
deque
|
|
nn
|
|
equ
|
|
cql
|
|
fn
|
|
boolean
|
|
pre
|
|
np
|
|
cuda
|
|
rnn
|
|
rew
|
|
pre
|
|
perceptron
|
|
bsz
|
|
dataset
|
|
mujoco
|
|
jit
|
|
nstep
|
|
preprocess
|
|
preprocessing
|
|
repo
|
|
ReLU
|
|
namespace
|
|
recv
|
|
th
|
|
utils
|
|
NaN
|
|
linesearch
|
|
hyperparameters
|
|
pseudocode
|
|
entropies
|
|
nn
|
|
config
|
|
cpu
|
|
rms
|
|
debias
|
|
indice
|
|
regularizer
|
|
miniblock
|
|
modularize
|
|
serializable
|
|
softmax
|
|
vectorized
|
|
optimizers
|
|
undiscounted
|
|
submodule
|
|
subclasses
|
|
submodules
|
|
tfevent
|
|
dirichlet
|
|
docstring
|
|
webpage
|
|
formatter
|
|
num
|
|
py
|
|
pythonic
|
|
中文文档位于
|
|
conda
|
|
miniconda
|
|
Amir
|
|
Andreas
|
|
Antonoglou
|
|
Beattie
|
|
Bellemare
|
|
Charles
|
|
Daan
|
|
Demis
|
|
Dharshan
|
|
Fidjeland
|
|
Georg
|
|
Hassabis
|
|
Helen
|
|
Ioannis
|
|
Kavukcuoglu
|
|
King
|
|
Koray
|
|
Kumaran
|
|
Legg
|
|
Mnih
|
|
Ostrovski
|
|
Petersen
|
|
Riedmiller
|
|
Rusu
|
|
Sadik
|
|
Shane
|
|
Stig
|
|
Veness
|
|
Volodymyr
|
|
Wierstra
|
|
Lillicrap
|
|
Pritzel
|
|
Heess
|
|
Erez
|
|
Yuval
|
|
Tassa
|
|
Schulman
|
|
Filip
|
|
Wolski
|
|
Prafulla
|
|
Dhariwal
|
|
Radford
|
|
Oleg
|
|
Klimov
|
|
Kaichao
|
|
Jiayi
|
|
Weng
|
|
Duburcq
|
|
Huayu
|
|
Yi
|
|
Su
|
|
Strens
|
|
Ornstein
|
|
Uhlenbeck
|
|
mse
|
|
gail
|
|
airl
|
|
ppo
|
|
Jupyter
|
|
Colab
|
|
Colaboratory
|
|
IPendulum
|
|
Reacher
|
|
Runtime
|
|
Nvidia
|
|
Enduro
|
|
Qbert
|
|
Seaquest
|
|
subnets
|
|
subprocesses
|
|
isort
|
|
yapf
|
|
pydocstyle
|
|
Args
|
|
tuples
|
|
tuple
|
|
Multi
|
|
multi
|
|
parameterized
|
|
Proximal
|
|
metadata
|
|
GPU
|
|
Dopamine
|
|
builtin
|
|
params
|
|
inplace
|
|
deepcopy
|
|
Gaussian
|