Tianshou/tianshou/utils/statistics.py

import torch
import numpy as np
from numbers import Number
from typing import List, Union


class MovAvg(object):
    """Class for moving average.

    It will automatically exclude the infinity and NaN. Usage:
    ::

        >>> stat = MovAvg(size=66)
        >>> stat.add(torch.tensor(5))
        5.0
        >>> stat.add(float('inf'))  # which will not add to stat
        5.0
        >>> stat.add([6, 7, 8])
        6.5
        >>> stat.get()
        6.5
        >>> print(f'{stat.mean():.2f}±{stat.std():.2f}')
        6.50±1.12
    """

    def __init__(self, size: int = 100) -> None:
        super().__init__()
        self.size = size
        self.cache: List[np.number] = []
        self.banned = [np.inf, np.nan, -np.inf]

    def add(
        self, x: Union[Number, np.number, list, np.ndarray, torch.Tensor]
    ) -> float:
        """Add a scalar into :class:`MovAvg`.

        You can add ``torch.Tensor`` with only one element, a python scalar, or
        a list of python scalar.
        """
        if isinstance(x, torch.Tensor):
            x = x.flatten().cpu().numpy()
        if np.isscalar(x):
            x = [x]
        for i in x:  # type: ignore
            if i not in self.banned:
                self.cache.append(i)
        if self.size > 0 and len(self.cache) > self.size:
            self.cache = self.cache[-self.size:]
        return self.get()

    def get(self) -> float:
        """Get the average."""
        if len(self.cache) == 0:
            return 0.0
        return float(np.mean(self.cache))

    def mean(self) -> float:
        """Get the average. Same as :meth:`get`."""
        return self.get()

    def std(self) -> float:
        """Get the standard deviation."""
        if len(self.cache) == 0:
            return 0.0
        return float(np.std(self.cache))


class RunningMeanStd(object):
    """Calulates the running mean and std of a data stream.

    https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
    """

    def __init__(
        self, mean: Union[float, np.ndarray] = 0.0, std: Union[float, np.ndarray] = 1.0
    ) -> None:
        self.mean, self.var = mean, std
        self.count = 0

    def update(self, x: np.ndarray) -> None:
        """Add a batch of item into RMS with the same shape, modify mean/var/count."""
        batch_mean, batch_var = np.mean(x, axis=0), np.var(x, axis=0)
        batch_count = len(x)

        delta = batch_mean - self.mean
        total_count = self.count + batch_count

        new_mean = self.mean + delta * batch_count / total_count
        m_a = self.var * self.count
        m_b = batch_var * batch_count
        m_2 = m_a + m_b + delta ** 2 * self.count * batch_count / total_count
        new_var = m_2 / total_count

        self.mean, self.var = new_mean, new_var
        self.count = total_count
add cache buf in collector 2020-03-14 21:48:31 +08:00			`import torch`
half of collector 2020-03-12 22:20:33 +08:00			`import numpy as np`
code format and update function signatures (#213) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black)) 2020-09-12 15:39:01 +08:00			`from numbers import Number`
type check in unit test (#200) Fix #195: Add mypy test in .github/workflows/docs_and_lint.yml. Also remove the out-of-the-date api 2020-09-13 19:31:50 +08:00			`from typing import List, Union`
half of collector 2020-03-12 22:20:33 +08:00

			`class MovAvg(object):`
fix docs and add docstring check (#210) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render 2020-09-11 07:55:37 +08:00			`"""Class for moving average.`

			`It will automatically exclude the infinity and NaN. Usage:`
add some docs 2020-04-03 21:28:12 +08:00			`::`

			`>>> stat = MovAvg(size=66)`
			`>>> stat.add(torch.tensor(5))`
			`5.0`
			`>>> stat.add(float('inf')) # which will not add to stat`
			`5.0`
			`>>> stat.add([6, 7, 8])`
			`6.5`
			`>>> stat.get()`
			`6.5`
			`>>> print(f'{stat.mean():.2f}±{stat.std():.2f}')`
			`6.50±1.12`
			`"""`
add type annotation 2020-05-12 11:31:47 +08:00
fix optional type syntax 2020-05-16 20:08:32 +08:00			`def __init__(self, size: int = 100) -> None:`
half of collector 2020-03-12 22:20:33 +08:00			`super().__init__()`
			`self.size = size`
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`self.cache: List[np.number] = []`
fix #36 2020-04-23 22:06:18 +08:00			`self.banned = [np.inf, np.nan, -np.inf]`
half of collector 2020-03-12 22:20:33 +08:00
code format and update function signatures (#213) Cherry-pick from #200 - update the function signature - format code-style - move _compile into separate functions - fix a bug in to_torch and to_numpy (Batch) - remove None in action_range In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black)) 2020-09-12 15:39:01 +08:00			`def add(`
			`self, x: Union[Number, np.number, list, np.ndarray, torch.Tensor]`
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`) -> float:`
fix docs and add docstring check (#210) - fix broken links and out-of-the-date content - add pydocstyle and doc8 check - remove collector.seed and collector.render 2020-09-11 07:55:37 +08:00			"""Add a scalar into :class:`MovAvg`.

			You can add ``torch.Tensor`` with only one element, a python scalar, or
			`a list of python scalar.`
add some docs 2020-04-03 21:28:12 +08:00			`"""`
add cache buf in collector 2020-03-14 21:48:31 +08:00			`if isinstance(x, torch.Tensor):`
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`x = x.flatten().cpu().numpy()`
			`if np.isscalar(x):`
			`x = [x]`
			`for i in x: # type: ignore`
			`if i not in self.banned:`
			`self.cache.append(i)`
half of collector 2020-03-12 22:20:33 +08:00			`if self.size > 0 and len(self.cache) > self.size:`
			`self.cache = self.cache[-self.size:]`
			`return self.get()`

fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`def get(self) -> float:`
docs for env 2020-04-04 21:02:06 +08:00			`"""Get the average."""`
half of collector 2020-03-12 22:20:33 +08:00			`if len(self.cache) == 0:`
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`return 0.0`
			`return float(np.mean(self.cache))`
finish dqn 2020-03-15 17:41:00 +08:00
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`def mean(self) -> float:`
docs for env 2020-04-04 21:02:06 +08:00			"""Get the average. Same as :meth:`get`."""
finish dqn 2020-03-15 17:41:00 +08:00			`return self.get()`

fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`def std(self) -> float:`
docs for env 2020-04-04 21:02:06 +08:00			`"""Get the standard deviation."""`
finish dqn 2020-03-15 17:41:00 +08:00			`if len(self.cache) == 0:`
fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`return 0.0`
			`return float(np.std(self.cache))`
support observation normalization in BaseVectorEnv (#308) add RunningMeanStd 2021-03-11 20:50:20 +08:00

			`class RunningMeanStd(object):`
			`"""Calulates the running mean and std of a data stream.`

			`https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm`
			`"""`

fix numpy>=1.20 typing check (#323) Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list). 2021-03-30 16:06:03 +08:00			`def __init__(`
			`self, mean: Union[float, np.ndarray] = 0.0, std: Union[float, np.ndarray] = 1.0`
			`) -> None:`
			`self.mean, self.var = mean, std`
support observation normalization in BaseVectorEnv (#308) add RunningMeanStd 2021-03-11 20:50:20 +08:00			`self.count = 0`

			`def update(self, x: np.ndarray) -> None:`
			`"""Add a batch of item into RMS with the same shape, modify mean/var/count."""`
			`batch_mean, batch_var = np.mean(x, axis=0), np.var(x, axis=0)`
			`batch_count = len(x)`

			`delta = batch_mean - self.mean`
			`total_count = self.count + batch_count`

			`new_mean = self.mean + delta * batch_count / total_count`
			`m_a = self.var * self.count`
			`m_b = batch_var * batch_count`
			`m_2 = m_a + m_b + delta ** 2 * self.count * batch_count / total_count`
			`new_var = m_2 / total_count`

Add discrete Conservative Q-Learning for offline RL (#359) Co-authored-by: Yi Su <yi.su@antgroup.com> Co-authored-by: Yi Su <yi.su@antfin.com> 2021-05-11 18:24:48 -07:00			`self.mean, self.var = new_mean, new_var`
support observation normalization in BaseVectorEnv (#308) add RunningMeanStd 2021-03-11 20:50:20 +08:00			`self.count = total_count`