Tianshou/tianshou/data/stats.py

from collections.abc import Sequence
from dataclasses import dataclass
from typing import TYPE_CHECKING, Optional

import numpy as np

from tianshou.utils.print import DataclassPPrintMixin

if TYPE_CHECKING:
    from tianshou.data import CollectStats, CollectStatsBase
    from tianshou.policy.base import TrainingStats


@dataclass(kw_only=True)
class SequenceSummaryStats(DataclassPPrintMixin):
    """A data structure for storing the statistics of a sequence."""

    mean: float
    std: float
    max: float
    min: float

    @classmethod
    def from_sequence(cls, sequence: Sequence[float | int] | np.ndarray) -> "SequenceSummaryStats":
        return cls(
            mean=float(np.mean(sequence)),
            std=float(np.std(sequence)),
            max=float(np.max(sequence)),
            min=float(np.min(sequence)),
        )


@dataclass(kw_only=True)
class TimingStats(DataclassPPrintMixin):
    """A data structure for storing timing statistics."""

    total_time: float = 0.0
    """The total time elapsed."""
    train_time: float = 0.0
    """The total time elapsed for training (collecting samples plus model update)."""
    train_time_collect: float = 0.0
    """The total time elapsed for collecting training transitions."""
    train_time_update: float = 0.0
    """The total time elapsed for updating models."""
    test_time: float = 0.0
    """The total time elapsed for testing models."""
    update_speed: float = 0.0
    """The speed of updating (env_step per second)."""


@dataclass(kw_only=True)
class InfoStats(DataclassPPrintMixin):
    """A data structure for storing information about the learning process."""

    gradient_step: int
    """The total gradient step."""
    best_reward: float
    """The best reward over the test results."""
    best_reward_std: float
    """Standard deviation of the best reward over the test results."""
    train_step: int
    """The total collected step of training collector."""
    train_episode: int
    """The total collected episode of training collector."""
    test_step: int
    """The total collected step of test collector."""
    test_episode: int
    """The total collected episode of test collector."""

    timing: TimingStats
    """The timing statistics."""


@dataclass(kw_only=True)
class EpochStats(DataclassPPrintMixin):
    """A data structure for storing epoch statistics."""

    epoch: int
    """The current epoch."""

    train_collect_stat: "CollectStatsBase"
    """The statistics of the last call to the training collector."""
    test_collect_stat: Optional["CollectStats"]
    """The statistics of the last call to the test collector."""
    training_stat: "TrainingStats"
    """The statistics of the last model update step."""
    info_stat: InfoStats
    """The information of the collector."""
Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`from collections.abc import Sequence`
			`from dataclasses import dataclass`
			`from typing import TYPE_CHECKING, Optional`

			`import numpy as np`

Docs/use nbqa on notebooks (#1041) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974 2024-02-07 17:28:16 +01:00			`from tianshou.utils.print import DataclassPPrintMixin`

Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`if TYPE_CHECKING:`
			`from tianshou.data import CollectStats, CollectStatsBase`
			`from tianshou.policy.base import TrainingStats`


			`@dataclass(kw_only=True)`
Docs/use nbqa on notebooks (#1041) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974 2024-02-07 17:28:16 +01:00			`class SequenceSummaryStats(DataclassPPrintMixin):`
Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`"""A data structure for storing the statistics of a sequence."""`

			`mean: float`
			`std: float`
			`max: float`
			`min: float`

			`@classmethod`
			`def from_sequence(cls, sequence: Sequence[float \| int] \| np.ndarray) -> "SequenceSummaryStats":`
			`return cls(`
			`mean=float(np.mean(sequence)),`
			`std=float(np.std(sequence)),`
			`max=float(np.max(sequence)),`
			`min=float(np.min(sequence)),`
			`)`


			`@dataclass(kw_only=True)`
Docs/use nbqa on notebooks (#1041) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974 2024-02-07 17:28:16 +01:00			`class TimingStats(DataclassPPrintMixin):`
Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`"""A data structure for storing timing statistics."""`

			`total_time: float = 0.0`
			`"""The total time elapsed."""`
			`train_time: float = 0.0`
			`"""The total time elapsed for training (collecting samples plus model update)."""`
			`train_time_collect: float = 0.0`
			`"""The total time elapsed for collecting training transitions."""`
			`train_time_update: float = 0.0`
			`"""The total time elapsed for updating models."""`
			`test_time: float = 0.0`
			`"""The total time elapsed for testing models."""`
			`update_speed: float = 0.0`
			`"""The speed of updating (env_step per second)."""`


			`@dataclass(kw_only=True)`
Docs/use nbqa on notebooks (#1041) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974 2024-02-07 17:28:16 +01:00			`class InfoStats(DataclassPPrintMixin):`
Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`"""A data structure for storing information about the learning process."""`

			`gradient_step: int`
			`"""The total gradient step."""`
			`best_reward: float`
			`"""The best reward over the test results."""`
			`best_reward_std: float`
			`"""Standard deviation of the best reward over the test results."""`
			`train_step: int`
			`"""The total collected step of training collector."""`
			`train_episode: int`
			`"""The total collected episode of training collector."""`
			`test_step: int`
			`"""The total collected step of test collector."""`
			`test_episode: int`
			`"""The total collected episode of test collector."""`

			`timing: TimingStats`
			`"""The timing statistics."""`


			`@dataclass(kw_only=True)`
Docs/use nbqa on notebooks (#1041) - Added nbqa to pyproject.toml - Resolved mypy issues on notebooks and related files - Conducting ruff checks on notebooks - Add DataclassPPrintMixin for better stats representation - Improved Notebooks wording and explanations Resolve: #1004 Related to #974 2024-02-07 17:28:16 +01:00			`class EpochStats(DataclassPPrintMixin):`
Feature/dataclasses (#996) This PR adds strict typing to the output of `update` and `learn` in all policies. This will likely be the last large refactoring PR before the next release (0.6.0, not 1.0.0), so it requires some attention. Several difficulties were encountered on the path to that goal: 1. The policy hierarchy is actually "broken" in the sense that the keys of dicts that were output by `learn` did not follow the same enhancement (inheritance) pattern as the policies. This is a real problem and should be addressed in the near future. Generally, several aspects of the policy design and hierarchy might deserve a dedicated discussion. 2. Each policy needs to be generic in the stats return type, because one might want to extend it at some point and then also extend the stats. Even within the source code base this pattern is necessary in many places. 3. The interaction between learn and update is a bit quirky, we currently handle it by having update modify special field inside TrainingStats, whereas all other fields are handled by learn. 4. The IQM module is a policy wrapper and required a TrainingStatsWrapper. The latter relies on a bunch of black magic. They were addressed by: 1. Live with the broken hierarchy, which is now made visible by bounds in generics. We use type: ignore where appropriate. 2. Make all policies generic with bounds following the policy inheritance hierarchy (which is incorrect, see above). We experimented a bit with nested TrainingStats classes, but that seemed to add more complexity and be harder to understand. Unfortunately, mypy thinks that the code below is wrong, wherefore we have to add `type: ignore` to the return of each `learn` ```python T = TypeVar("T", bound=int) def f() -> T: return 3 ``` 3. See above 4. Write representative tests for the `TrainingStatsWrapper`. Still, the black magic might cause nasty surprises down the line (I am not proud of it)... Closes #933 --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de> 2023-12-30 11:09:03 +01:00			`"""A data structure for storing epoch statistics."""`

			`epoch: int`
			`"""The current epoch."""`

			`train_collect_stat: "CollectStatsBase"`
			`"""The statistics of the last call to the training collector."""`
			`test_collect_stat: Optional["CollectStats"]`
			`"""The statistics of the last call to the test collector."""`
			`training_stat: "TrainingStats"`
			`"""The statistics of the last model update step."""`
			`info_stat: InfoStats`
			`"""The information of the collector."""`