Tianshou/tianshou/env/gym_wrappers.py

from typing import Any, Dict, List, SupportsFloat, Tuple, Union

import gymnasium as gym
import numpy as np
from packaging import version


class ContinuousToDiscrete(gym.ActionWrapper):
    """Gym environment wrapper to take discrete action in a continuous environment.

    :param gym.Env env: gym environment with continuous action space.
    :param int action_per_dim: number of discrete actions in each dimension
        of the action space.
    """

    def __init__(self, env: gym.Env, action_per_dim: Union[int, List[int]]) -> None:
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.Box)
        low, high = env.action_space.low, env.action_space.high
        if isinstance(action_per_dim, int):
            action_per_dim = [action_per_dim] * env.action_space.shape[0]
        assert len(action_per_dim) == env.action_space.shape[0]
        self.action_space = gym.spaces.MultiDiscrete(action_per_dim)
        self.mesh = np.array(
            [np.linspace(lo, hi, a) for lo, hi, a in zip(low, high, action_per_dim)],
            dtype=object
        )

    def action(self, act: np.ndarray) -> np.ndarray:
        # modify act
        assert len(act.shape) <= 2, f"Unknown action format with shape {act.shape}."
        if len(act.shape) == 1:
            return np.array([self.mesh[i][a] for i, a in enumerate(act)])
        return np.array([[self.mesh[i][a] for i, a in enumerate(a_)] for a_ in act])


class MultiDiscreteToDiscrete(gym.ActionWrapper):
    """Gym environment wrapper to take discrete action in multidiscrete environment.

    :param gym.Env env: gym environment with multidiscrete action space.
    """

    def __init__(self, env: gym.Env) -> None:
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.MultiDiscrete)
        nvec = env.action_space.nvec
        assert nvec.ndim == 1
        self.bases = np.ones_like(nvec)
        for i in range(1, len(self.bases)):
            self.bases[i] = self.bases[i - 1] * nvec[-i]
        self.action_space = gym.spaces.Discrete(np.prod(nvec))

    def action(self, act: np.ndarray) -> np.ndarray:
        converted_act = []
        for b in np.flip(self.bases):
            converted_act.append(act // b)
            act = act % b
        return np.array(converted_act).transpose()


class TruncatedAsTerminated(gym.Wrapper):
    """A wrapper that set ``terminated = terminated or truncated`` for ``step()``.

    It's intended to use with ``gym.wrappers.TimeLimit``.

    :param gym.Env env: gym environment.
    """

    def __init__(self, env: gym.Env):
        super().__init__(env)
        if not version.parse(gym.__version__) >= version.parse('0.26.0'):
            raise EnvironmentError(
                f"TruncatedAsTerminated is not applicable with gym version \
                {gym.__version__}"
            )

    def step(self,
             act: np.ndarray) -> Tuple[Any, SupportsFloat, bool, bool, Dict[str, Any]]:
        observation, reward, terminated, truncated, info = super().step(act)
        terminated = (terminated or truncated)
        return observation, reward, terminated, truncated, info
Gymnasium Integration (#789) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs? 2023-02-03 20:57:27 +01:00			`from typing import Any, Dict, List, SupportsFloat, Tuple, Union`
MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00
Gymnasium Integration (#789) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs? 2023-02-03 20:57:27 +01:00			`import gymnasium as gym`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00			`import numpy as np`
Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00			`from packaging import version`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00

			`class ContinuousToDiscrete(gym.ActionWrapper):`
			`"""Gym environment wrapper to take discrete action in a continuous environment.`

			`:param gym.Env env: gym environment with continuous action space.`
MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00			`:param int action_per_dim: number of discrete actions in each dimension`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00			`of the action space.`
			`"""`

MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00			`def __init__(self, env: gym.Env, action_per_dim: Union[int, List[int]]) -> None:`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00			`super().__init__(env)`
			`assert isinstance(env.action_space, gym.spaces.Box)`
			`low, high = env.action_space.low, env.action_space.high`
MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00			`if isinstance(action_per_dim, int):`
			`action_per_dim = [action_per_dim] * env.action_space.shape[0]`
			`assert len(action_per_dim) == env.action_space.shape[0]`
			`self.action_space = gym.spaces.MultiDiscrete(action_per_dim)`
			`self.mesh = np.array(`
			`[np.linspace(lo, hi, a) for lo, hi, a in zip(low, high, action_per_dim)],`
			`dtype=object`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00			`)`

making pettingzoo a core dep instead of optional req (#837) close #831 2023-03-25 22:01:09 -07:00			`def action(self, act: np.ndarray) -> np.ndarray:`
Add BranchingDQN for large discrete action spaces (#618) 2022-05-15 15:40:32 +02:00			`# modify act`
MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00			`assert len(act.shape) <= 2, f"Unknown action format with shape {act.shape}."`
			`if len(act.shape) == 1:`
			`return np.array([self.mesh[i][a] for i, a in enumerate(act)])`
			`return np.array([[self.mesh[i][a] for i, a in enumerate(a_)] for a_ in act])`


			`class MultiDiscreteToDiscrete(gym.ActionWrapper):`
			`"""Gym environment wrapper to take discrete action in multidiscrete environment.`

			`:param gym.Env env: gym environment with multidiscrete action space.`
			`"""`

			`def __init__(self, env: gym.Env) -> None:`
			`super().__init__(env)`
			`assert isinstance(env.action_space, gym.spaces.MultiDiscrete)`
			`nvec = env.action_space.nvec`
			`assert nvec.ndim == 1`
			`self.bases = np.ones_like(nvec)`
			`for i in range(1, len(self.bases)):`
			`self.bases[i] = self.bases[i - 1] * nvec[-i]`
			`self.action_space = gym.spaces.Discrete(np.prod(nvec))`

making pettingzoo a core dep instead of optional req (#837) close #831 2023-03-25 22:01:09 -07:00			`def action(self, act: np.ndarray) -> np.ndarray:`
MultiDiscrete to discrete gym action space wrapper (#664) Has been tested to work with DQN and a custom MultiDiscrete gym env. 2022-06-13 00:18:22 +02:00			`converted_act = []`
			`for b in np.flip(self.bases):`
			`converted_act.append(act // b)`
			`act = act % b`
			`return np.array(converted_act).transpose()`
Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00

			`class TruncatedAsTerminated(gym.Wrapper):`
			"""A wrapper that set ``terminated = terminated or truncated`` for ``step()``.

			It's intended to use with ``gym.wrappers.TimeLimit``.

			`:param gym.Env env: gym environment.`
			`"""`

			`def __init__(self, env: gym.Env):`
			`super().__init__(env)`
			`if not version.parse(gym.__version__) >= version.parse('0.26.0'):`
			`raise EnvironmentError(`
			`f"TruncatedAsTerminated is not applicable with gym version \`
			`{gym.__version__}"`
			`)`

Gymnasium Integration (#789) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs? 2023-02-03 20:57:27 +01:00			`def step(self,`
			`act: np.ndarray) -> Tuple[Any, SupportsFloat, bool, bool, Dict[str, Any]]:`
Hindsight Experience Replay as a replay buffer (#753) ## implementation I implemented HER solely as a replay buffer. It is done by temporarily directly re-writing transitions storage (`self._meta`) during the `sample_indices()` call. The original transitions are cached and will be restored at the beginning of the next sampling or when other methods is called. This will make sure that. for example, n-step return calculation can be done without altering the policy. There is also a problem with the original indices sampling. The sampled indices are not guaranteed to be from different episodes. So I decided to perform re-writing based on the episode. This guarantees that the sampled transitions from the same episode will have the same re-written goal. This also make the re-writing ratio calculation slightly differ from the paper, but it won't be too different if there are many episodes in the buffer. In the current commit, HER replay buffer only support 'future' strategy and online sampling. This is the best of HER in term of performance and memory efficiency. I also add a few more convenient replay buffers (`HERVectorReplayBuffer`, `HERReplayBufferManager`), test env (`MyGoalEnv`), gym wrapper (`TruncatedAsTerminated`), unit tests, and a simple example (examples/offline/fetch_her_ddpg.py). ## verification I have added unit tests for almost everything I have implemented. HER replay buffer was also tested using DDPG on [`FetchReach-v3` env](https://github.com/Farama-Foundation/Gymnasium-Robotics). I used default DDPG parameters from mujoco example and didn't tune anything further to get this good result! (train script: examples/offline/fetch_her_ddpg.py). ![Screen Shot 2022-10-02 at 19 22 53](https://user-images.githubusercontent.com/42699114/193454066-0dd0c65c-fd5f-4587-8912-b441d39de88a.png) 2022-10-31 08:54:54 +09:00			`observation, reward, terminated, truncated, info = super().step(act)`
			`terminated = (terminated or truncated)`
			`return observation, reward, terminated, truncated, info`