Tianshou/examples/mujoco/mujoco_ddpg_hl.py

#!/usr/bin/env python3

import functools
import os
from collections.abc import Sequence

from examples.mujoco.mujoco_env import MujocoEnvFactory
from tianshou.highlevel.config import SamplingConfig
from tianshou.highlevel.experiment import (
    DDPGExperimentBuilder,
    ExperimentConfig,
)
from tianshou.highlevel.params.noise import MaxActionScaledGaussian
from tianshou.highlevel.params.policy_params import DDPGParams
from tianshou.utils import logging
from tianshou.utils.logging import datetime_tag


def main(
    experiment_config: ExperimentConfig,
    task: str = "Ant-v4",
    buffer_size: int = 1000000,
    hidden_sizes: Sequence[int] = (256, 256),
    actor_lr: float = 1e-3,
    critic_lr: float = 1e-3,
    gamma: float = 0.99,
    tau: float = 0.005,
    exploration_noise: float = 0.1,
    start_timesteps: int = 25000,
    epoch: int = 200,
    step_per_epoch: int = 5000,
    step_per_collect: int = 1,
    update_per_step: int = 1,
    n_step: int = 1,
    batch_size: int = 256,
    training_num: int = 1,
    test_num: int = 10,
) -> None:
    log_name = os.path.join(task, "ddpg", str(experiment_config.seed), datetime_tag())

    sampling_config = SamplingConfig(
        num_epochs=epoch,
        step_per_epoch=step_per_epoch,
        batch_size=batch_size,
        num_train_envs=training_num,
        num_test_envs=test_num,
        buffer_size=buffer_size,
        step_per_collect=step_per_collect,
        update_per_step=update_per_step,
        repeat_per_collect=None,
        start_timesteps=start_timesteps,
        start_timesteps_random=True,
    )

    env_factory = MujocoEnvFactory(task, experiment_config.seed, obs_norm=False)

    experiment = (
        DDPGExperimentBuilder(env_factory, experiment_config, sampling_config)
        .with_ddpg_params(
            DDPGParams(
                actor_lr=actor_lr,
                critic_lr=critic_lr,
                gamma=gamma,
                tau=tau,
                exploration_noise=MaxActionScaledGaussian(exploration_noise),
                estimation_step=n_step,
            ),
        )
        .with_actor_factory_default(hidden_sizes)
        .with_critic_factory_default(hidden_sizes)
        .build()
    )
    experiment.run(log_name)


if __name__ == "__main__":
    run_with_default_config = functools.partial(main, experiment_config=ExperimentConfig())
    logging.run_cli(run_with_default_config)
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`#!/usr/bin/env python3`

Refactoring/mypy issues test (#1017) Improves typing in examples and tests, towards mypy passing there. Introduces the SpaceInfo utility 2024-02-06 14:24:30 +01:00			`import functools`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`import os`
			`from collections.abc import Sequence`

			`from examples.mujoco.mujoco_env import MujocoEnvFactory`
Remove 'RL' prefix from class names 2023-10-06 13:50:23 +02:00			`from tianshou.highlevel.config import SamplingConfig`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`from tianshou.highlevel.experiment import (`
			`DDPGExperimentBuilder,`
Remove 'RL' prefix from class names 2023-10-06 13:50:23 +02:00			`ExperimentConfig,`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`)`
			`from tianshou.highlevel.params.noise import MaxActionScaledGaussian`
			`from tianshou.highlevel.params.policy_params import DDPGParams`
Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da. 2023-11-07 10:54:22 +01:00			`from tianshou.utils import logging`
			`from tianshou.utils.logging import datetime_tag`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00

			`def main(`
Remove 'RL' prefix from class names 2023-10-06 13:50:23 +02:00			`experiment_config: ExperimentConfig,`
Update MuJoCo examples to use Ant-v4 instead of Ant-v3 2024-01-10 15:39:53 +01:00			`task: str = "Ant-v4",`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`buffer_size: int = 1000000,`
			`hidden_sizes: Sequence[int] = (256, 256),`
			`actor_lr: float = 1e-3,`
			`critic_lr: float = 1e-3,`
			`gamma: float = 0.99,`
			`tau: float = 0.005,`
			`exploration_noise: float = 0.1,`
			`start_timesteps: int = 25000,`
			`epoch: int = 200,`
			`step_per_epoch: int = 5000,`
			`step_per_collect: int = 1,`
			`update_per_step: int = 1,`
			`n_step: int = 1,`
			`batch_size: int = 256,`
			`training_num: int = 1,`
			`test_num: int = 10,`
Refactoring/mypy issues test (#1017) Improves typing in examples and tests, towards mypy passing there. Introduces the SpaceInfo utility 2024-02-06 14:24:30 +01:00			`) -> None:`
Handle obs_norm setting in MuJoCo envs 2023-10-18 13:20:26 +02:00			`log_name = os.path.join(task, "ddpg", str(experiment_config.seed), datetime_tag())`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00
Remove 'RL' prefix from class names 2023-10-06 13:50:23 +02:00			`sampling_config = SamplingConfig(`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`num_epochs=epoch,`
			`step_per_epoch=step_per_epoch,`
			`batch_size=batch_size,`
			`num_train_envs=training_num,`
			`num_test_envs=test_num,`
			`buffer_size=buffer_size,`
			`step_per_collect=step_per_collect,`
			`update_per_step=update_per_step,`
			`repeat_per_collect=None,`
			`start_timesteps=start_timesteps,`
			`start_timesteps_random=True,`
			`)`

Change interface of EnvFactory to ensure that configuration of number of environments in SamplingConfig is used (values are now passed to factory method) This is clearer and removes the need to pass otherwise unnecessary configuration to environment factories at construction 2023-10-18 23:55:23 +02:00			`env_factory = MujocoEnvFactory(task, experiment_config.seed, obs_norm=False)`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00
			`experiment = (`
Reorder ExperimentBuilder args (EnvFactory first) 2023-10-06 13:53:45 +02:00			`DDPGExperimentBuilder(env_factory, experiment_config, sampling_config)`
Add DDPG high-level API and MuJoCo example 2023-10-03 20:26:39 +02:00			`.with_ddpg_params(`
			`DDPGParams(`
			`actor_lr=actor_lr,`
			`critic_lr=critic_lr,`
			`gamma=gamma,`
			`tau=tau,`
			`exploration_noise=MaxActionScaledGaussian(exploration_noise),`
			`estimation_step=n_step,`
			`),`
			`)`
			`.with_actor_factory_default(hidden_sizes)`
			`.with_critic_factory_default(hidden_sizes)`
			`.build()`
			`)`
			`experiment.run(log_name)`


			`if __name__ == "__main__":`
Refactoring/mypy issues test (#1017) Improves typing in examples and tests, towards mypy passing there. Introduces the SpaceInfo utility 2024-02-06 14:24:30 +01:00			`run_with_default_config = functools.partial(main, experiment_config=ExperimentConfig())`
			`logging.run_cli(run_with_default_config)`