Tianshou/tianshou/highlevel/params/alpha.py

from abc import ABC, abstractmethod

import numpy as np
import torch

from tianshou.highlevel.env import Environments
from tianshou.highlevel.module.core import TDevice
from tianshou.highlevel.optim import OptimizerFactory
from tianshou.utils.string import ToStringMixin


class AutoAlphaFactory(ToStringMixin, ABC):
    @abstractmethod
    def create_auto_alpha(
        self,
        envs: Environments,
        optim_factory: OptimizerFactory,
        device: TDevice,
    ) -> tuple[float, torch.Tensor, torch.optim.Optimizer]:
        pass


class AutoAlphaFactoryDefault(AutoAlphaFactory):  # TODO better name?
    def __init__(self, lr: float = 3e-4):
        self.lr = lr

    def create_auto_alpha(
        self,
        envs: Environments,
        optim_factory: OptimizerFactory,
        device: TDevice,
    ) -> tuple[float, torch.Tensor, torch.optim.Optimizer]:
        target_entropy = float(-np.prod(envs.get_action_shape()))
        log_alpha = torch.zeros(1, requires_grad=True, device=device)
        alpha_optim = torch.optim.Adam([log_alpha], lr=self.lr)
        return target_entropy, log_alpha, alpha_optim
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00			`from abc import ABC, abstractmethod`

			`import numpy as np`
			`import torch`

			`from tianshou.highlevel.env import Environments`
Add support for discrete PPO * Refactored module `module` (split into submodules) * Basic support for discrete environments * Implement Atari env. factory * Implement DQN-based actor factory * Implement notion of reusing agent preprocessing network for critic * Add example atari_ppo_hl 2023-09-28 20:07:52 +02:00			`from tianshou.highlevel.module.core import TDevice`
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00			`from tianshou.highlevel.optim import OptimizerFactory`
Revert "Depend on sensAI instead of copying its utils (logging, string)" This reverts commit fdb0eba93d81fa5e698770b4f7088c87fc1238da. 2023-11-07 10:54:22 +01:00			`from tianshou.utils.string import ToStringMixin`
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00

Add ToStringMixin to further high-level parameter classes 2023-10-05 13:15:24 +02:00			`class AutoAlphaFactory(ToStringMixin, ABC):`
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00			`@abstractmethod`
			`def create_auto_alpha(`
			`self,`
			`envs: Environments,`
			`optim_factory: OptimizerFactory,`
			`device: TDevice,`
			`) -> tuple[float, torch.Tensor, torch.optim.Optimizer]:`
			`pass`


Adapt class naming scheme * Use prefix convention (subclasses have superclass names as prefix) to facilitate discoverability of relevant classes via IDE autocompletion * Use dual naming, adding an alternative concise name that omits the precise OO semantics and retains only the essential part of the name (which can be more pleasing to users not accustomed to convoluted OO naming) 2023-09-27 17:20:35 +02:00			`class AutoAlphaFactoryDefault(AutoAlphaFactory): # TODO better name?`
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00			`def __init__(self, lr: float = 3e-4):`
			`self.lr = lr`

			`def create_auto_alpha(`
			`self,`
			`envs: Environments,`
			`optim_factory: OptimizerFactory,`
			`device: TDevice,`
			`) -> tuple[float, torch.Tensor, torch.optim.Optimizer]:`
Improve type annotations, fix type issues and add checks 2023-10-09 17:22:52 +02:00			`target_entropy = float(-np.prod(envs.get_action_shape()))`
Improve high-level policy parametrisation Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way 2023-09-25 17:56:37 +02:00			`log_alpha = torch.zeros(1, requires_grad=True, device=device)`
			`alpha_optim = torch.optim.Adam([log_alpha], lr=self.lr)`
			`return target_entropy, log_alpha, alpha_optim`