Tianshou/examples/mujoco/mujoco_env.py

import logging
import pickle

from tianshou.env import BaseVectorEnv, VectorEnvNormObs
from tianshou.highlevel.env import (
    ContinuousEnvironments,
    EnvFactoryRegistered,
    EnvMode,
    EnvPoolFactory,
    VectorEnvType,
)
from tianshou.highlevel.persistence import Persistence, PersistEvent, RestoreEvent
from tianshou.highlevel.world import World

envpool_is_available = True
try:
    import envpool
except ImportError:
    envpool_is_available = False
    envpool = None

log = logging.getLogger(__name__)


def make_mujoco_env(task: str, seed: int, num_train_envs: int, num_test_envs: int, obs_norm: bool):
    """Wrapper function for Mujoco env.

    If EnvPool is installed, it will automatically switch to EnvPool's Mujoco env.

    :return: a tuple of (single env, training envs, test envs).
    """
    envs = MujocoEnvFactory(task, seed, obs_norm=obs_norm).create_envs(
        num_train_envs,
        num_test_envs,
    )
    return envs.env, envs.train_envs, envs.test_envs


class MujocoEnvObsRmsPersistence(Persistence):
    FILENAME = "env_obs_rms.pkl"

    def persist(self, event: PersistEvent, world: World) -> None:
        if event != PersistEvent.PERSIST_POLICY:
            return
        obs_rms = world.envs.train_envs.get_obs_rms()
        path = world.persist_path(self.FILENAME)
        log.info(f"Saving environment obs_rms value to {path}")
        with open(path, "wb") as f:
            pickle.dump(obs_rms, f)

    def restore(self, event: RestoreEvent, world: World):
        if event != RestoreEvent.RESTORE_POLICY:
            return
        path = world.restore_path(self.FILENAME)
        log.info(f"Restoring environment obs_rms value from {path}")
        with open(path, "rb") as f:
            obs_rms = pickle.load(f)
        world.envs.train_envs.set_obs_rms(obs_rms)
        world.envs.test_envs.set_obs_rms(obs_rms)
        if world.envs.watch_env is not None:
            world.envs.watch_env.set_obs_rms(obs_rms)


class MujocoEnvFactory(EnvFactoryRegistered):
    def __init__(
        self,
        task: str,
        seed: int,
        obs_norm: bool = True,
        venv_type: VectorEnvType = VectorEnvType.SUBPROC_SHARED_MEM,
    ) -> None:
        super().__init__(
            task=task,
            seed=seed,
            venv_type=venv_type,
            envpool_factory=EnvPoolFactory() if envpool_is_available else None,
        )
        self.obs_norm = obs_norm

    def create_venv(self, num_envs: int, mode: EnvMode) -> BaseVectorEnv:
        """Create vectorized environments.

        :param num_envs: the number of environments
        :param mode: the mode for which to create
        :return: the vectorized environments
        """
        env = super().create_venv(num_envs, mode)
        # obs norm wrapper
        if self.obs_norm:
            env = VectorEnvNormObs(env, update_obs_rms=mode == EnvMode.TRAIN)
        return env

    def create_envs(
        self,
        num_training_envs: int,
        num_test_envs: int,
        create_watch_env: bool = False,
    ) -> ContinuousEnvironments:
        envs = super().create_envs(num_training_envs, num_test_envs, create_watch_env)
        assert isinstance(envs, ContinuousEnvironments)

        if self.obs_norm:
            envs.test_envs.set_obs_rms(envs.train_envs.get_obs_rms())
            if envs.watch_env is not None:
                envs.watch_env.set_obs_rms(envs.train_envs.get_obs_rms())
            envs.set_persistence(MujocoEnvObsRmsPersistence())
        return envs
Improve persistence handling * Add persistence/restoration of Experiment instance * Add file logging in experiment * Allow all persistence/logging to be disabled * Disable persistence in tests 2023-10-12 17:40:16 +02:00			`import logging`
Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00			`import pickle`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`from tianshou.env import BaseVectorEnv, VectorEnvNormObs`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`from tianshou.highlevel.env import (`
			`ContinuousEnvironments,`
Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered 2024-01-16 12:22:07 +01:00			`EnvFactoryRegistered,`
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`EnvMode,`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`EnvPoolFactory,`
			`VectorEnvType,`
			`)`
Improve persistence handling * Add persistence/restoration of Experiment instance * Add file logging in experiment * Allow all persistence/logging to be disabled * Disable persistence in tests 2023-10-12 17:40:16 +02:00			`from tianshou.highlevel.persistence import Persistence, PersistEvent, RestoreEvent`
Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00			`from tianshou.highlevel.world import World`
Add show_progress option for trainer (#641) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm. 2022-05-17 17:41:59 +02:00
Make envpool usage configuration more explicit 2024-01-16 12:16:46 +01:00			`envpool_is_available = True`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`try:`
			`import envpool`
			`except ImportError:`
Make envpool usage configuration more explicit 2024-01-16 12:16:46 +01:00			`envpool_is_available = False`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`envpool = None`

Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00			`log = logging.getLogger(__name__)`

Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00
Add SAC high-level interface 2023-09-20 09:29:34 +02:00			`def make_mujoco_env(task: str, seed: int, num_train_envs: int, num_test_envs: int, obs_norm: bool):`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`"""Wrapper function for Mujoco env.`

			`If EnvPool is installed, it will automatically switch to EnvPool's Mujoco env.`

			`:return: a tuple of (single env, training envs, test envs).`
			`"""`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`envs = MujocoEnvFactory(task, seed, obs_norm=obs_norm).create_envs(`
			`num_train_envs,`
			`num_test_envs,`
			`)`
			`return envs.env, envs.train_envs, envs.test_envs`
Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-09-19 18:53:11 +02:00

Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00			`class MujocoEnvObsRmsPersistence(Persistence):`
			`FILENAME = "env_obs_rms.pkl"`

			`def persist(self, event: PersistEvent, world: World) -> None:`
			`if event != PersistEvent.PERSIST_POLICY:`
			`return`
			`obs_rms = world.envs.train_envs.get_obs_rms()`
			`path = world.persist_path(self.FILENAME)`
			`log.info(f"Saving environment obs_rms value to {path}")`
			`with open(path, "wb") as f:`
			`pickle.dump(obs_rms, f)`

			`def restore(self, event: RestoreEvent, world: World):`
			`if event != RestoreEvent.RESTORE_POLICY:`
			`return`
			`path = world.restore_path(self.FILENAME)`
			`log.info(f"Restoring environment obs_rms value from {path}")`
			`with open(path, "rb") as f:`
			`obs_rms = pickle.load(f)`
			`world.envs.train_envs.set_obs_rms(obs_rms)`
			`world.envs.test_envs.set_obs_rms(obs_rms)`
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`if world.envs.watch_env is not None:`
			`world.envs.watch_env.set_obs_rms(obs_rms)`
Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00

Refactoring, improving class name EnvFactoryGymnasium -> EnvFactoryRegistered 2024-01-16 12:22:07 +01:00			`class MujocoEnvFactory(EnvFactoryRegistered):`
Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> 2024-03-14 11:07:56 +01:00			`def __init__(`
			`self,`
			`task: str,`
			`seed: int,`
			`obs_norm: bool = True,`
			`venv_type: VectorEnvType = VectorEnvType.SUBPROC_SHARED_MEM,`
			`) -> None:`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`super().__init__(`
			`task=task,`
			`seed=seed,`
Allow explicit setting of multiprocessing context for SubprocEnvWorker (#1072) Running multiple training runs in parallel (with, for example, joblib) fails on macOS due to a change in the standard context for multiprocessing (see [here](https://stackoverflow.com/questions/65098398/why-using-fork-works-but-using-spawn-fails-in-python3-8-multiprocessing) or [here](https://www.reddit.com/r/learnpython/comments/g5372v/multiprocessing_with_fork_on_macos/)). This PR adds the ability to explicitly set a multiprocessing context for the SubProcEnvWorker (similar to gymnasium's [AsyncVecEnv](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/vector/async_vector_env.py)). --------- Co-authored-by: Maximilian Huettenrauch <m.huettenrauch@appliedai.de> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> 2024-03-14 11:07:56 +01:00			`venv_type=venv_type,`
Make envpool usage configuration more explicit 2024-01-16 12:16:46 +01:00			`envpool_factory=EnvPoolFactory() if envpool_is_available else None,`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`)`
Handle obs_norm setting in MuJoCo envs 2023-10-18 13:20:26 +02:00			`self.obs_norm = obs_norm`
Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-09-19 18:53:11 +02:00
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`def create_venv(self, num_envs: int, mode: EnvMode) -> BaseVectorEnv:`
			`"""Create vectorized environments.`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`:param num_envs: the number of environments`
			`:param mode: the mode for which to create`
			`:return: the vectorized environments`
			`"""`
			`env = super().create_venv(num_envs, mode)`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`# obs norm wrapper`
Fix: Add MujocoEnvObsRmsPersistence only if obs_norm is enabled 2023-10-24 13:52:30 +02:00			`if self.obs_norm:`
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`env = VectorEnvNormObs(env, update_obs_rms=mode == EnvMode.TRAIN)`
			`return env`

			`def create_envs(`
			`self,`
			`num_training_envs: int,`
			`num_test_envs: int,`
			`create_watch_env: bool = False,`
			`) -> ContinuousEnvironments:`
			`envs = super().create_envs(num_training_envs, num_test_envs, create_watch_env)`
			`assert isinstance(envs, ContinuousEnvironments)`

			`if self.obs_norm:`
Improve environment factory abstractions in high-level API: * EnvFactory now uses the creation of a single environment as the basic functionality which the more high-level functions build upon * Introduce enum EnvMode to indicate the purpose for which an env is created, allowing the factory creation process to change its behaviour accordingly * Add EnvFactoryGymnasium to provide direct support for envs that can be created via gymnasium.make - EnvPool is supported via an injectible EnvPoolFactory - Existing EnvFactory implementations are now derived from EnvFactoryGymnasium * Use a separate environment (which uses new EnvMode.WATCH) for watching agent performance after training (instead of using test environments, which the user may want to configure differently) 2024-01-10 15:37:58 +01:00			`envs.test_envs.set_obs_rms(envs.train_envs.get_obs_rms())`
Fix/add watch env with obs rms (#1061) Supports deciding whether to watch the agent performing on the env using high-level interfaces 2024-02-29 15:59:11 +01:00			`if envs.watch_env is not None:`
			`envs.watch_env.set_obs_rms(envs.train_envs.get_obs_rms())`
Fix: Add MujocoEnvObsRmsPersistence only if obs_norm is enabled 2023-10-24 13:52:30 +02:00			`envs.set_persistence(MujocoEnvObsRmsPersistence())`
Support obs_rms persistence for MuJoCo by adding a general mechanism for attaching persistence to Environments instances 2023-10-12 15:01:49 +02:00			`return envs`