Tianshou/test/base/env.py

import gym
import time
import random
import numpy as np
from gym.spaces import Discrete, MultiDiscrete, Box, Dict, Tuple


class MyTestEnv(gym.Env):
    """This is a "going right" task. The task is to go right ``size`` steps.
    """

    def __init__(self, size, sleep=0, dict_state=False, recurse_state=False,
                 ma_rew=0, multidiscrete_action=False, random_sleep=False):
        assert not (
            dict_state and recurse_state), \
            "dict_state and recurse_state cannot both be true"
        self.size = size
        self.sleep = sleep
        self.random_sleep = random_sleep
        self.dict_state = dict_state
        self.recurse_state = recurse_state
        self.ma_rew = ma_rew
        self._md_action = multidiscrete_action
        if dict_state:
            self.observation_space = Dict(
                {"index": Box(shape=(1, ), low=0, high=size - 1),
                 "rand": Box(shape=(1,), low=0, high=1, dtype=np.float64)})
        elif recurse_state:
            self.observation_space = Dict(
                {"index": Box(shape=(1, ), low=0, high=size - 1),
                 "dict": Dict({
                     "tuple": Tuple((Discrete(2), Box(shape=(2,),
                                     low=0, high=1, dtype=np.float64))),
                     "rand": Box(shape=(1, 2), low=0, high=1,
                                 dtype=np.float64)})
                 })
        else:
            self.observation_space = Box(shape=(1, ), low=0, high=size - 1)
        if multidiscrete_action:
            self.action_space = MultiDiscrete([2, 2])
        else:
            self.action_space = Discrete(2)
        self.done = False
        self.index = 0
        self.seed()

    def seed(self, seed=0):
        self.rng = np.random.RandomState(seed)
        return [seed]

    def reset(self, state=0):
        self.done = False
        self.index = state
        return self._get_state()

    def _get_reward(self):
        """Generate a non-scalar reward if ma_rew is True."""
        x = int(self.done)
        if self.ma_rew > 0:
            return [x] * self.ma_rew
        return x

    def _get_state(self):
        """Generate state(observation) of MyTestEnv"""
        if self.dict_state:
            return {'index': np.array([self.index], dtype=np.float32),
                    'rand': self.rng.rand(1)}
        elif self.recurse_state:
            return {'index': np.array([self.index], dtype=np.float32),
                    'dict': {"tuple": (np.array([1],
                                       dtype=np.int64), self.rng.rand(2)),
                             "rand": self.rng.rand(1, 2)}}
        else:
            return np.array([self.index], dtype=np.float32)

    def step(self, action):
        if self._md_action:
            action = action[0]
        if self.done:
            raise ValueError('step after done !!!')
        if self.sleep > 0:
            sleep_time = random.random() if self.random_sleep else 1
            sleep_time *= self.sleep
            time.sleep(sleep_time)
        if self.index == self.size:
            self.done = True
            return self._get_state(), self._get_reward(), self.done, {}
        if action == 0:
            self.index = max(self.index - 1, 0)
            return self._get_state(), self._get_reward(), self.done, \
                {'key': 1, 'env': self} if self.dict_state else {}
        elif action == 1:
            self.index += 1
            self.done = self.index == self.size
            return self._get_state(), self._get_reward(), \
                self.done, {'key': 1, 'env': self}
Enable getattr for SubprocVecEnv. (#74) * Enable getattr for SubprovVecEnv. * Consistent API between VectorEnv and SubprocVecEnv. * Avoid code duplication. Add unit tests. * Add docstring. * Test more branches. * Fix UT. Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> 2020-06-05 11:17:43 +02:00			`import gym`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00			`import time`
Asynchronous sampling vector environment (#134) Fix #103 Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-26 12:01:21 +02:00			`import random`
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00			`import numpy as np`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`from gym.spaces import Discrete, MultiDiscrete, Box, Dict, Tuple`
refract test code 2020-03-21 10:58:01 +08:00

			`class MyTestEnv(gym.Env):`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00			"""This is a "going right" task. The task is to go right ``size`` steps.
			`"""`

ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`def __init__(self, size, sleep=0, dict_state=False, recurse_state=False,`
			`ma_rew=0, multidiscrete_action=False, random_sleep=False):`
			`assert not (`
			`dict_state and recurse_state), \`
			`"dict_state and recurse_state cannot both be true"`
refract test code 2020-03-21 10:58:01 +08:00			`self.size = size`
			`self.sleep = sleep`
Asynchronous sampling vector environment (#134) Fix #103 Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-26 12:01:21 +02:00			`self.random_sleep = random_sleep`
Multimodal obs (#38, #27, #25) 2020-04-28 20:56:02 +08:00			`self.dict_state = dict_state`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`self.recurse_state = recurse_state`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00			`self.ma_rew = ma_rew`
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00			`self._md_action = multidiscrete_action`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`if dict_state:`
			`self.observation_space = Dict(`
			`{"index": Box(shape=(1, ), low=0, high=size - 1),`
			`"rand": Box(shape=(1,), low=0, high=1, dtype=np.float64)})`
			`elif recurse_state:`
			`self.observation_space = Dict(`
			`{"index": Box(shape=(1, ), low=0, high=size - 1),`
			`"dict": Dict({`
			`"tuple": Tuple((Discrete(2), Box(shape=(2,),`
			`low=0, high=1, dtype=np.float64))),`
			`"rand": Box(shape=(1, 2), low=0, high=1,`
			`dtype=np.float64)})`
			`})`
			`else:`
			`self.observation_space = Box(shape=(1, ), low=0, high=size - 1)`
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00			`if multidiscrete_action:`
			`self.action_space = MultiDiscrete([2, 2])`
			`else:`
			`self.action_space = Discrete(2)`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`self.done = False`
			`self.index = 0`
			`self.seed()`
refract test code 2020-03-21 10:58:01 +08:00
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00			`def seed(self, seed=0):`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`self.rng = np.random.RandomState(seed)`
code refactor for venv (#179) - Refacor code to remove duplicate code - Enable async simulation for all vector envs - Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv` The abstraction of vector env changed. Prior to this pr, each vector env is almost independent. After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility. Co-authored-by: n+e <463003665@qq.com> Co-authored-by: magicly <magicly007@gmail.com> 2020-08-19 15:00:24 +08:00			`return [seed]`
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00
fix rnn (#19), add __repr__, and fix #26 2020-04-09 19:53:45 +08:00			`def reset(self, state=0):`
refract test code 2020-03-21 10:58:01 +08:00			`self.done = False`
fix rnn (#19), add __repr__, and fix #26 2020-04-09 19:53:45 +08:00			`self.index = state`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`return self._get_state()`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00
			`def _get_reward(self):`
			`"""Generate a non-scalar reward if ma_rew is True."""`
			`x = int(self.done)`
			`if self.ma_rew > 0:`
			`return [x] * self.ma_rew`
			`return x`

ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`def _get_state(self):`
			`"""Generate state(observation) of MyTestEnv"""`
			`if self.dict_state:`
			`return {'index': np.array([self.index], dtype=np.float32),`
			`'rand': self.rng.rand(1)}`
			`elif self.recurse_state:`
			`return {'index': np.array([self.index], dtype=np.float32),`
			`'dict': {"tuple": (np.array([1],`
			`dtype=np.int64), self.rng.rand(2)),`
			`"rand": self.rng.rand(1, 2)}}`
			`else:`
			`return np.array([self.index], dtype=np.float32)`
refract test code 2020-03-21 10:58:01 +08:00
			`def step(self, action):`
Yet another 3 fix (#160) 1. DQN learn should keep eps=0 2. Add a warning of env.seed in VecEnv 3. fix #162 of multi-dim action 2020-07-24 17:38:12 +08:00			`if self._md_action:`
			`action = action[0]`
refract test code 2020-03-21 10:58:01 +08:00			`if self.done:`
			`raise ValueError('step after done !!!')`
			`if self.sleep > 0:`
Asynchronous sampling vector environment (#134) Fix #103 Co-authored-by: youkaichao <youkaichao@126.com> Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-26 12:01:21 +02:00			`sleep_time = random.random() if self.random_sleep else 1`
			`sleep_time *= self.sleep`
			`time.sleep(sleep_time)`
refract test code 2020-03-21 10:58:01 +08:00			`if self.index == self.size:`
			`self.done = True`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`return self._get_state(), self._get_reward(), self.done, {}`
refract test code 2020-03-21 10:58:01 +08:00			`if action == 0:`
			`self.index = max(self.index - 1, 0)`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`return self._get_state(), self._get_reward(), self.done, \`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00			`{'key': 1, 'env': self} if self.dict_state else {}`
refract test code 2020-03-21 10:58:01 +08:00			`elif action == 1:`
			`self.index += 1`
			`self.done = self.index == self.size`
ShmemVectorEnv Implementation (#174) * add shmem vecenv, some add&fix in test_env * generalize test_env IO * pep8 fix * comment update * style change * pep8 fix * style fix * minor fix * fix a bug * test fix * change env * testenv bug fix& shmem support recurse dict * bugfix * pep8 fix * _NP_TO_CT enhance * doc update * docstring update * pep8 fix * style change * style fix * remove assert * minor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-08-04 13:39:05 +08:00			`return self._get_state(), self._get_reward(), \`
Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> 2020-07-13 00:24:31 +08:00			`self.done, {'key': 1, 'env': self}`