Tianshou/examples/mujoco/mujoco_env.py

import warnings

import gymnasium as gym

from tianshou.config import RLSamplingConfig, BasicExperimentConfig
from tianshou.env import ShmemVectorEnv, VectorEnvNormObs
from tianshou.highlevel.env import EnvFactory, Environments, ContinuousEnvironments

try:
    import envpool
except ImportError:
    envpool = None


def make_mujoco_env(
    task: str, seed: int, num_train_envs: int, num_test_envs: int, obs_norm: bool
):
    """Wrapper function for Mujoco env.

    If EnvPool is installed, it will automatically switch to EnvPool's Mujoco env.

    :return: a tuple of (single env, training envs, test envs).
    """
    if envpool is not None:
        train_envs = env = envpool.make_gymnasium(task, num_envs=num_train_envs, seed=seed)
        test_envs = envpool.make_gymnasium(task, num_envs=num_test_envs, seed=seed)
    else:
        warnings.warn(
            "Recommend using envpool (pip install envpool) "
            "to run Mujoco environments more efficiently.",
        )
        env = gym.make(task)
        train_envs = ShmemVectorEnv([lambda: gym.make(task) for _ in range(num_train_envs)])
        test_envs = ShmemVectorEnv([lambda: gym.make(task) for _ in range(num_test_envs)])
        train_envs.seed(seed)
        test_envs.seed(seed)
    if obs_norm:
        # obs norm wrapper
        train_envs = VectorEnvNormObs(train_envs)
        test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)
        test_envs.set_obs_rms(train_envs.get_obs_rms())
    return env, train_envs, test_envs


class MujocoEnvFactory(EnvFactory):
    def __init__(self, experiment_config: BasicExperimentConfig, sampling_config: RLSamplingConfig):
        self.sampling_config = sampling_config
        self.experiment_config = experiment_config

    def create_envs(self) -> ContinuousEnvironments:
        env, train_envs, test_envs = make_mujoco_env(
            task=self.experiment_config.task,
            seed=self.experiment_config.seed,
            num_train_envs=self.sampling_config.num_train_envs,
            num_test_envs=self.sampling_config.num_test_envs,
            obs_norm=True,
        )
        return ContinuousEnvironments(env=env, train_envs=train_envs, test_envs=test_envs)
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`import warnings`

Gymnasium Integration (#789) Changes: - Disclaimer in README - Replaced all occurences of Gym with Gymnasium - Removed code that is now dead since we no longer need to support the old step API - Updated type hints to only allow new step API - Increased required version of envpool to support Gymnasium - Increased required version of PettingZoo to support Gymnasium - Updated `PettingZooEnv` to only use the new step API, removed hack to also support old API - I had to add some `# type: ignore` comments, due to new type hinting in Gymnasium. I'm not that familiar with type hinting but I believe that the issue is on the Gymnasium side and we are looking into it. - Had to update `MyTestEnv` to support `options` kwarg - Skip NNI tests because they still use OpenAI Gym - Also allow `PettingZooEnv` in vector environment - Updated doc page about ReplayBuffer to also talk about terminated and truncated flags. Still need to do: - Update the Jupyter notebooks in docs - Check the entire code base for more dead code (from compatibility stuff) - Check the reset functions of all environments/wrappers in code base to make sure they use the `options` kwarg - Someone might want to check test_env_finite.py - Is it okay to allow `PettingZooEnv` in vector environments? Might need to update docs? 2023-02-03 20:57:27 +01:00			`import gymnasium as gym`
Add show_progress option for trainer (#641) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm. 2022-05-17 17:41:59 +02:00
Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-09-19 18:53:11 +02:00			`from tianshou.config import RLSamplingConfig, BasicExperimentConfig`
Add show_progress option for trainer (#641) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm. 2022-05-17 17:41:59 +02:00			`from tianshou.env import ShmemVectorEnv, VectorEnvNormObs`
Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-09-19 18:53:11 +02:00			`from tianshou.highlevel.env import EnvFactory, Environments, ContinuousEnvironments`
Add show_progress option for trainer (#641) - A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar; - Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm. 2022-05-17 17:41:59 +02:00
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`try:`
			`import envpool`
			`except ImportError:`
			`envpool = None`


Addition of dataclasses based config for scripts, major refactoring So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py 2023-07-26 20:24:33 +02:00			`def make_mujoco_env(`
			`task: str, seed: int, num_train_envs: int, num_test_envs: int, obs_norm: bool`
			`):`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`"""Wrapper function for Mujoco env.`

			`If EnvPool is installed, it will automatically switch to EnvPool's Mujoco env.`

			`:return: a tuple of (single env, training envs, test envs).`
			`"""`
			`if envpool is not None:`
Addition of dataclasses based config for scripts, major refactoring So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py 2023-07-26 20:24:33 +02:00			`train_envs = env = envpool.make_gymnasium(task, num_envs=num_train_envs, seed=seed)`
			`test_envs = envpool.make_gymnasium(task, num_envs=num_test_envs, seed=seed)`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`else:`
			`warnings.warn(`
			`"Recommend using envpool (pip install envpool) "`
Python 3.9, black + ruff formatting (#921) Preparation for #914 and #920 Changes formatting to ruff and black. Remove python 3.8 ## Additional Changes - Removed flake8 dependencies - Adjusted pre-commit. Now CI and Make use pre-commit, reducing the duplication of linting calls - Removed check-docstyle option (ruff is doing that) - Merged format and lint. In CI the format-lint step fails if any changes are done, so it fulfills the lint functionality. --------- Co-authored-by: Jiayi Weng <jiayi@openai.com> 2023-08-25 23:40:56 +02:00			`"to run Mujoco environments more efficiently.",`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`)`
			`env = gym.make(task)`
Addition of dataclasses based config for scripts, major refactoring So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py 2023-07-26 20:24:33 +02:00			`train_envs = ShmemVectorEnv([lambda: gym.make(task) for _ in range(num_train_envs)])`
			`test_envs = ShmemVectorEnv([lambda: gym.make(task) for _ in range(num_test_envs)])`
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628) - add VectorEnvWrapper and VectorEnvNormObs - obs_rms store in policy save/load - align mujoco scripts with atari: obs_norm, envpool, wandb and README 2022-05-05 07:55:15 -04:00			`train_envs.seed(seed)`
			`test_envs.seed(seed)`
			`if obs_norm:`
			`# obs norm wrapper`
			`train_envs = VectorEnvNormObs(train_envs)`
			`test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)`
			`test_envs.set_obs_rms(train_envs.get_obs_rms())`
			`return env, train_envs, test_envs`
Initial high-level interfaces, demonstrated in mujoco_ppo_hl 2023-09-19 18:53:11 +02:00

			`class MujocoEnvFactory(EnvFactory):`
			`def __init__(self, experiment_config: BasicExperimentConfig, sampling_config: RLSamplingConfig):`
			`self.sampling_config = sampling_config`
			`self.experiment_config = experiment_config`

			`def create_envs(self) -> ContinuousEnvironments:`
			`env, train_envs, test_envs = make_mujoco_env(`
			`task=self.experiment_config.task,`
			`seed=self.experiment_config.seed,`
			`num_train_envs=self.sampling_config.num_train_envs,`
			`num_test_envs=self.sampling_config.num_test_envs,`
			`obs_norm=True,`
			`)`
			`return ContinuousEnvironments(env=env, train_envs=train_envs, test_envs=test_envs)`