Typo docstring (#1132)
This commit is contained in:
parent
61426acf07
commit
f31a91df5d
@ -556,7 +556,7 @@ class BasePolicy(nn.Module, Generic[TTrainingStats], ABC):
|
|||||||
advantage + value, which is exactly equivalent to using :math:`TD(\lambda)`
|
advantage + value, which is exactly equivalent to using :math:`TD(\lambda)`
|
||||||
for estimating returns.
|
for estimating returns.
|
||||||
|
|
||||||
Setting v_s_ and v_s to None (or all zeros) and gae_lambda to 1.0 calculates the
|
Setting `v_s_` and `v_s` to None (or all zeros) and `gae_lambda` to 1.0 calculates the
|
||||||
discounted return-to-go/ Monte-Carlo return.
|
discounted return-to-go/ Monte-Carlo return.
|
||||||
|
|
||||||
:param batch: a data batch which contains several episodes of data in
|
:param batch: a data batch which contains several episodes of data in
|
||||||
@ -564,12 +564,12 @@ class BasePolicy(nn.Module, Generic[TTrainingStats], ABC):
|
|||||||
should be marked by done flag, unfinished (or collecting) episodes will be
|
should be marked by done flag, unfinished (or collecting) episodes will be
|
||||||
recognized by buffer.unfinished_index().
|
recognized by buffer.unfinished_index().
|
||||||
:param buffer: the corresponding replay buffer.
|
:param buffer: the corresponding replay buffer.
|
||||||
:param numpy.ndarray indices: tell batch's location in buffer, batch is equal
|
:param indices: tells the batch's location in buffer, batch is equal
|
||||||
to buffer[indices].
|
to buffer[indices].
|
||||||
:param np.ndarray v_s_: the value function of all next states :math:`V(s')`.
|
:param v_s_: the value function of all next states :math:`V(s')`.
|
||||||
If None, it will be set to an array of 0.
|
If None, it will be set to an array of 0.
|
||||||
:param v_s: the value function of all current states :math:`V(s)`. If None,
|
:param v_s: the value function of all current states :math:`V(s)`. If None,
|
||||||
it is set based upon v_s_ rolled by 1.
|
it is set based upon `v_s_` rolled by 1.
|
||||||
:param gamma: the discount factor, should be in [0, 1].
|
:param gamma: the discount factor, should be in [0, 1].
|
||||||
:param gae_lambda: the parameter for Generalized Advantage Estimation,
|
:param gae_lambda: the parameter for Generalized Advantage Estimation,
|
||||||
should be in [0, 1].
|
should be in [0, 1].
|
||||||
|
Loading…
x
Reference in New Issue
Block a user