Tianshou/tianshou/utils/logger/base.py

from abc import ABC, abstractmethod
from numbers import Number
from typing import Callable, Optional, Union

import numpy as np

LOG_DATA_TYPE = dict[str, Union[int, Number, np.number, np.ndarray]]


class BaseLogger(ABC):
    """The base class for any logger which is compatible with trainer.

    Try to overwrite write() method to use your own writer.

    :param int train_interval: the log interval in log_train_data(). Default to 1000.
    :param int test_interval: the log interval in log_test_data(). Default to 1.
    :param int update_interval: the log interval in log_update_data(). Default to 1000.
    """

    def __init__(
        self,
        train_interval: int = 1000,
        test_interval: int = 1,
        update_interval: int = 1000,
    ) -> None:
        super().__init__()
        self.train_interval = train_interval
        self.test_interval = test_interval
        self.update_interval = update_interval
        self.last_log_train_step = -1
        self.last_log_test_step = -1
        self.last_log_update_step = -1

    @abstractmethod
    def write(self, step_type: str, step: int, data: LOG_DATA_TYPE) -> None:
        """Specify how the writer is used to log data.

        :param str step_type: namespace which the data dict belongs to.
        :param int step: stands for the ordinate of the data dict.
        :param dict data: the data to write with format ``{key: value}``.
        """

    def log_train_data(self, collect_result: dict, step: int) -> None:
        """Use writer to log statistics generated during training.

        :param collect_result: a dict containing information of data collected in
            training stage, i.e., returns of collector.collect().
        :param int step: stands for the timestep the collect_result being logged.
        """
        if collect_result["n/ep"] > 0 and step - self.last_log_train_step >= self.train_interval:
            log_data = {
                "train/episode": collect_result["n/ep"],
                "train/reward": collect_result["rew"],
                "train/length": collect_result["len"],
            }
            self.write("train/env_step", step, log_data)
            self.last_log_train_step = step

    def log_test_data(self, collect_result: dict, step: int) -> None:
        """Use writer to log statistics generated during evaluating.

        :param collect_result: a dict containing information of data collected in
            evaluating stage, i.e., returns of collector.collect().
        :param int step: stands for the timestep the collect_result being logged.
        """
        assert collect_result["n/ep"] > 0
        if step - self.last_log_test_step >= self.test_interval:
            log_data = {
                "test/env_step": step,
                "test/reward": collect_result["rew"],
                "test/length": collect_result["len"],
                "test/reward_std": collect_result["rew_std"],
                "test/length_std": collect_result["len_std"],
            }
            self.write("test/env_step", step, log_data)
            self.last_log_test_step = step

    def log_update_data(self, update_result: dict, step: int) -> None:
        """Use writer to log statistics generated during updating.

        :param update_result: a dict containing information of data collected in
            updating stage, i.e., returns of policy.update().
        :param int step: stands for the timestep the collect_result being logged.
        """
        if step - self.last_log_update_step >= self.update_interval:
            log_data = {f"update/{k}": v for k, v in update_result.items()}
            self.write("update/gradient_step", step, log_data)
            self.last_log_update_step = step

    @abstractmethod
    def save_data(
        self,
        epoch: int,
        env_step: int,
        gradient_step: int,
        save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None,
    ) -> None:
        """Use writer to log metadata when calling ``save_checkpoint_fn`` in trainer.

        :param int epoch: the epoch in trainer.
        :param int env_step: the env_step in trainer.
        :param int gradient_step: the gradient_step in trainer.
        :param function save_checkpoint_fn: a hook defined by user, see trainer
            documentation for detail.
        """

    @abstractmethod
    def restore_data(self) -> tuple[int, int, int]:
        """Return the metadata from existing log.

        If it finds nothing or an error occurs during the recover process, it will
        return the default parameters.

        :return: epoch, env_step, gradient_step.
        """


class LazyLogger(BaseLogger):
    """A logger that does nothing. Used as the placeholder in trainer."""

    def __init__(self) -> None:
        super().__init__()

    def write(self, step_type: str, step: int, data: LOG_DATA_TYPE) -> None:
        """The LazyLogger writes nothing."""

    def save_data(
        self,
        epoch: int,
        env_step: int,
        gradient_step: int,
        save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None,
    ) -> None:
        pass

    def restore_data(self) -> tuple[int, int, int]:
        return 0, 0, 0
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00			`from abc import ABC, abstractmethod`
bump to v0.4.3 (#432) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check 2021-09-03 05:05:04 +08:00			`from numbers import Number`
Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`from typing import Callable, Optional, Union`
bump to v0.4.3 (#432) * add makefile * bump version * add isort and yapf * update contributing.md * update PR template * spelling check 2021-09-03 05:05:04 +08:00
			`import numpy as np`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00
Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`LOG_DATA_TYPE = dict[str, Union[int, Number, np.number, np.ndarray]]`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00

			`class BaseLogger(ABC):`
			`"""The base class for any logger which is compatible with trainer.`

			`Try to overwrite write() method to use your own writer.`

			`:param int train_interval: the log interval in log_train_data(). Default to 1000.`
			`:param int test_interval: the log interval in log_test_data(). Default to 1.`
			`:param int update_interval: the log interval in log_update_data(). Default to 1000.`
			`"""`

			`def __init__(`
			`self,`
			`train_interval: int = 1000,`
			`test_interval: int = 1,`
			`update_interval: int = 1000,`
			`) -> None:`
			`super().__init__()`
			`self.train_interval = train_interval`
			`self.test_interval = test_interval`
			`self.update_interval = update_interval`
			`self.last_log_train_step = -1`
			`self.last_log_test_step = -1`
			`self.last_log_update_step = -1`

			`@abstractmethod`
			`def write(self, step_type: str, step: int, data: LOG_DATA_TYPE) -> None:`
			`"""Specify how the writer is used to log data.`

			`:param str step_type: namespace which the data dict belongs to.`
			`:param int step: stands for the ordinate of the data dict.`
			:param dict data: the data to write with format ``{key: value}``.
			`"""`

			`def log_train_data(self, collect_result: dict, step: int) -> None:`
			`"""Use writer to log statistics generated during training.`

			`:param collect_result: a dict containing information of data collected in`
			`training stage, i.e., returns of collector.collect().`
			`:param int step: stands for the timestep the collect_result being logged.`
			`"""`
Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`if collect_result["n/ep"] > 0 and step - self.last_log_train_step >= self.train_interval:`
			`log_data = {`
			`"train/episode": collect_result["n/ep"],`
			`"train/reward": collect_result["rew"],`
			`"train/length": collect_result["len"],`
			`}`
			`self.write("train/env_step", step, log_data)`
			`self.last_log_train_step = step`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00
			`def log_test_data(self, collect_result: dict, step: int) -> None:`
			`"""Use writer to log statistics generated during evaluating.`

			`:param collect_result: a dict containing information of data collected in`
			`evaluating stage, i.e., returns of collector.collect().`
			`:param int step: stands for the timestep the collect_result being logged.`
			`"""`
			`assert collect_result["n/ep"] > 0`
			`if step - self.last_log_test_step >= self.test_interval:`
			`log_data = {`
			`"test/env_step": step,`
update save_fn in trainer (#459) - collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) - save_fn() will be called at the beginning of trainer 2021-10-13 09:25:24 -04:00			`"test/reward": collect_result["rew"],`
			`"test/length": collect_result["len"],`
			`"test/reward_std": collect_result["rew_std"],`
			`"test/length_std": collect_result["len_std"],`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00			`}`
			`self.write("test/env_step", step, log_data)`
			`self.last_log_test_step = step`

			`def log_update_data(self, update_result: dict, step: int) -> None:`
			`"""Use writer to log statistics generated during updating.`

			`:param update_result: a dict containing information of data collected in`
			`updating stage, i.e., returns of policy.update().`
			`:param int step: stands for the timestep the collect_result being logged.`
			`"""`
			`if step - self.last_log_update_step >= self.update_interval:`
			`log_data = {f"update/{k}": v for k, v in update_result.items()}`
			`self.write("update/gradient_step", step, log_data)`
			`self.last_log_update_step = step`

Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00			`@abstractmethod`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00			`def save_data(`
			`self,`
			`epoch: int,`
			`env_step: int,`
			`gradient_step: int,`
Fix save_checkpoint_fn return value (#659) - Fix save_checkpoint_fn return value to checkpoint_path; - Fix wrong link in doc; - Fix an off-by-one bug in trainer iterator. 2022-06-02 12:07:07 -05:00			`save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None,`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00			`) -> None:`
			"""Use writer to log metadata when calling ``save_checkpoint_fn`` in trainer.

			`:param int epoch: the epoch in trainer.`
			`:param int env_step: the env_step in trainer.`
			`:param int gradient_step: the gradient_step in trainer.`
			`:param function save_checkpoint_fn: a hook defined by user, see trainer`
			`documentation for detail.`
			`"""`

Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00			`@abstractmethod`
Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`def restore_data(self) -> tuple[int, int, int]:`
Add Weights and Biases Logger (#427) - rename BasicLogger to TensorboardLogger - refactor logger code - add WandbLogger Co-authored-by: Jiayi Weng <trinkle23897@gmail.com> 2021-08-30 10:35:02 -04:00			`"""Return the metadata from existing log.`

			`If it finds nothing or an error occurs during the recover process, it will`
			`return the default parameters.`

			`:return: epoch, env_step, gradient_step.`
			`"""`


			`class LazyLogger(BaseLogger):`
			`"""A logger that does nothing. Used as the placeholder in trainer."""`

			`def __init__(self) -> None:`
			`super().__init__()`

			`def write(self, step_type: str, step: int, data: LOG_DATA_TYPE) -> None:`
			`"""The LazyLogger writes nothing."""`
Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00
			`def save_data(`
			`self,`
			`epoch: int,`
			`env_step: int,`
			`gradient_step: int,`
			`save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None,`
			`) -> None:`
			`pass`

Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`def restore_data(self) -> tuple[int, int, int]:`
bugfixes: gym->gymnasium; render() update (#769) Credits (names from the Farama Discord): - @nrwahl2 - @APN-Pucky - chattershuts 2022-11-11 20:25:35 +00:00			`return 0, 0, 0`