update readme and force flake8

2020-03-28 13:27:01 +08:00 · 2020-03-28 13:27:01 +08:00 · f68f23292e
commit f68f23292e
parent 068c4068ec
9 changed files with 95 additions and 92 deletions
--- a/.github/workflows/pytest.yml
+++ b/.github/workflows/pytest.yml
@ -26,15 +26,10 @@ jobs:
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
-        pip install .
+        pip install ".[dev]"
    - name: Lint with flake8
      run: |
-        pip install flake8
-        # stop the build if there are Python syntax errors or undefined names
-        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
-        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
-        flake8 . --count --exit-zero --max-complexity=30 --max-line-length=79 --statistics
+        flake8 . --count --show-source --statistics
    - name: Test with pytest
      run: |
-        pip install pytest pytest-cov
        pytest test --cov tianshou
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -24,8 +24,7 @@ pytest test --cov tianshou -s

 We follow PEP8 python code style. To check, in the main directory, run:
 ```python
-flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
-flake8 . --count --exit-zero --max-complexity=30 --max-line-length=79 --statistics
+flake8 . --count --show-source --statistics
 ```

 #### Documents
@ -40,4 +39,4 @@ To compile docs into webpages, Run
 make html
 ```
 under the `docs/` directory. The generated webpages are in `docs/_build` and
-can be viewed with browsers.
+can be viewed with browsers.
--- a/README.md
+++ b/README.md
@ -1,9 +1,9 @@

 <h1 align="center">Tianshou</h1>

-![PyPI](https://img.shields.io/pypi/v/tianshou)
-![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)
-[![Documentation Status](https://readthedocs.org/projects/tianshou/badge/?version=latest)](https://tianshou.readthedocs.io/en/latest/?badge=latest)
+[![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/)
+[![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions)
+[![Documentation Status](https://readthedocs.org/projects/tianshou/badge/?version=latest)](https://tianshou.readthedocs.io)
 [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network)
 [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues)
@ -35,7 +35,7 @@ pip3 install tianshou

 ## Documentation

-The tutorials and API documentation are hosted on [https://tianshou.readthedocs.io](https://tianshou.readthedocs.io). It is under construction currently.
+The tutorials and API documentation are hosted on [https://tianshou.readthedocs.io](https://tianshou.readthedocs.io).

 The example scripts are under [test/](/test/) folder and [examples/](/examples/) folder.

@ -53,16 +53,18 @@ We select some of famous (>1k stars) reinforcement learning platforms. Here is t
 | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 | GitHub Stars    | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/stargazers) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) |
 | Algo - Task     | PyTorch                                                      | TensorFlow                                                   | TF/PyTorch                                                   | PyTorch                                                      | PyTorch                                                      |
-| PG - CartPole   | 9.03±4.18s                                                   | None                                                         | 15.77±6.28s                                                  | None                                                         |                                                              |
-| DQN - CartPole  | 10.61±5.51s                                                  | 1046.34±291.27s                                              | 40.16±12.79s                                                 | 175.55±53.81s                                                |                                                              |
-| A2C - CartPole  | 11.72±3.85s                                                  | *(~1612s)                                                    | 46.15±6.64s                                                  | Runtime Error                                                |                                                              |
-| PPO - CartPole  | 35.25±16.47s                                                 | *(~1179s)                                                    | 62.21±13.31s (APPO)                                          | 29.16±15.46s                                                 |                                                              |
+| PG - CartPole   | 9.03±4.18s                                                   | None                                                         | 15.77±6.28s                                                  | None                                                         | ?                                                            |
+| DQN - CartPole  | 10.61±5.51s                                                  | 1046.34±291.27s                                              | 40.16±12.79s                                                 | 175.55±53.81s                                                | ?                                                            |
+| A2C - CartPole  | 11.72±3.85s                                                  | *(~1612s)                                                    | 46.15±6.64s                                                  | Runtime Error                                                | ?                                                            |
+| PPO - CartPole  | 35.25±16.47s                                                 | *(~1179s)                                                    | 62.21±13.31s (APPO)                                          | 29.16±15.46s                                                 | ?                                                            |
 | DDPG - Pendulum | 46.95±24.31s                                                 | *(>1h)                                                       | 377.99±13.79s                                                | 652.83±471.28s                                               | 172.18±62.48s                                                |
 | TD3 - Pendulum  | 48.39±7.22s                                                  | None                                                         | 620.83±248.43s                                               | 619.33±324.97s                                               | 210.31±76.30s                                                |
 | SAC - Pendulum  | 38.92±2.09s                                                  | None                                                         | 92.68±4.48s                                                  | 808.21±405.70s                                               | 295.92±140.85s                                               |

 *: Could not reach the target reward threshold in 1e6 steps in any of 10 runs. The total runtime is in the brackets.

+?: We have tried but it is nontrivial for running non-Atari game on rlpyt. See [here](https://github.com/astooke/rlpyt/issues/127#issuecomment-601741210).
+
 All of the platforms use 10 different seeds for testing. We erase those trials which failed for training. The reward threshold is 195.0 in CartPole and -250.0 in Pendulum over consecutive 100 episodes' mean returns. 

 Tianshou and RLlib's configures are very similar. They both use multiple workers for sampling. Indeed, both RLlib and rlpyt are excellent reinforcement learning platform :)
--- a/examples/README.md
+++ b/examples/README.md
@ -0,0 +1,3 @@
+Result of Ant-v2:
+
+![](/docs/_static/images/Ant-v2.png)
--- a/test/discrete/net.py
+++ b/test/discrete/net.py
@ -74,7 +74,7 @@ class DQN(nn.Module):

    def forward(self, x, state=None, info={}):
        if not isinstance(x, torch.Tensor):
-            s = torch.tensor(x, device=self.device, dtype=torch.float)
+            x = torch.tensor(x, device=self.device, dtype=torch.float)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
--- a/tianshou/env/atari.py
+++ b/tianshou/env/atari.py
@ -1,13 +1,11 @@
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
 import cv2
 import gym
 import numpy as np
 from gym.spaces.box import Box


-def create_atari_environment(name=None, sticky_actions=True, max_episode_steps=2000):
+def create_atari_environment(name=None, sticky_actions=True,
+                             max_episode_steps=2000):
    game_version = 'v0' if sticky_actions else 'v4'
    name = '{}NoFrameskip-{}'.format(name, game_version)
    env = gym.make(name)
@ -61,7 +59,8 @@ class preprocessing(object):
        self._grayscale_obs(self.screen_buffer[0])
        self.screen_buffer[1].fill(0)

-        return np.stack([self._pool_and_resize() for _ in range(self.frame_skip)], axis=-1)
+        return np.stack([
+            self._pool_and_resize() for _ in range(self.frame_skip)], axis=-1)

    def render(self, mode):

@ -95,7 +94,9 @@ class preprocessing(object):
        if len(observation) > 0:
            observation = np.stack(observation, axis=-1)
        else:
-            observation = np.stack([self._pool_and_resize() for _ in range(self.frame_skip)], axis=-1)
+            observation = np.stack([
+                self._pool_and_resize() for _ in range(self.frame_skip)],
+                axis=-1)
        if self.count >= self.max_episode_steps:
            terminal = True
        self.terminal = terminal
--- a/tianshou/env/mujoco/init.py
+++ b/tianshou/env/mujoco/init.py
@ -1,5 +1,4 @@
 from gym.envs.registration import register
-import gym

 register(
    id='PointMaze-v0',
--- a/tianshou/env/mujoco/maze_env_utils.py
+++ b/tianshou/env/mujoco/maze_env_utils.py
@ -1,5 +1,4 @@
 """Adapted from rllab maze_env_utils.py."""
-import numpy as np
 import math


@ -111,25 +110,24 @@ def construct_maze(maze_id='Maze'):
            [1, 1, 1, 1],
        ]
    elif maze_id == 'Block':
-        O = 'r'
        structure = [
            [1, 1, 1, 1, 1],
-            [1, O, 0, 0, 1],
+            [1, 'r', 0, 0, 1],
            [1, 0, 0, 0, 1],
            [1, 0, 0, 0, 1],
            [1, 1, 1, 1, 1],
        ]
    elif maze_id == 'BlockMaze':
-        O = 'r'
        structure = [
            [1, 1, 1, 1],
-            [1, O, 0, 1],
+            [1, 'r', 0, 1],
            [1, 1, 0, 1],
            [1, 0, 0, 1],
            [1, 1, 1, 1],
        ]
    else:
-        raise NotImplementedError('The provided MazeId %s is not recognized' % maze_id)
+        raise NotImplementedError(
+            'The provided MazeId %s is not recognized' % maze_id)

    return structure

@ -157,7 +155,8 @@ def line_intersect(pt1, pt2, ptA, ptB):

    DET = (-dx1 * dy + dy1 * dx)

-    if math.fabs(DET) < DET_TOLERANCE: return (0, 0, 0, 0, 0)
+    if math.fabs(DET) < DET_TOLERANCE:
+        return (0, 0, 0, 0, 0)

    # now, the determinant should be OK
    DETinv = 1.0 / DET
@ -176,8 +175,9 @@ def line_intersect(pt1, pt2, ptA, ptB):

 def ray_segment_intersect(ray, segment):
    """
-    Check if the ray originated from (x, y) with direction theta intersects the line segment (x1, y1) -- (x2, y2),
-    and return the intersection point if there is one
+    Check if the ray originated from (x, y) with direction theta
+    intersects the line segment (x1, y1) -- (x2, y2), and return
+    the intersection point if there is one
    """
    (x, y), theta = ray
    # (x1, y1), (x2, y2) = segment
--- a/tianshou/env/mujoco/point_maze_env.py
+++ b/tianshou/env/mujoco/point_maze_env.py
@ -108,7 +108,7 @@ class PointMazeEnv(gym.Env):
                        rgba="0.9 0.9 0.9 1",
                    )
                if struct == 1:  # Unmovable block.
-                    # Offset all coordinates so that robot starts at the origin.
+                    # Offset all coordinates so that robot starts at the origin
                    ET.SubElement(
                        worldbody, "geom",
                        name="block_%d_%d" % (i, j),
@ -134,13 +134,13 @@ class PointMazeEnv(gym.Env):
                    y_offset = 0.0
                    shrink = 0.1 if spinning else 0.99 if falling else 1.0
                    height_shrink = 0.1 if spinning else 1.0
+                    _x = j * size_scaling - torso_x + x_offset
+                    _y = i * size_scaling - torso_y + y_offset
+                    _z = height / 2 * size_scaling * height_shrink
                    movable_body = ET.SubElement(
                        worldbody, "body",
                        name=name,
-                        pos="%f %f %f" % (j * size_scaling - torso_x + x_offset,
-                                          i * size_scaling - torso_y + y_offset,
-                                          height_offset +
-                                          height / 2 * size_scaling * height_shrink),
+                        pos="%f %f %f" % (_x, _y, height_offset + _z),
                    )
                    ET.SubElement(
                        movable_body, "geom",
@ -148,7 +148,7 @@ class PointMazeEnv(gym.Env):
                        pos="0 0 0",
                        size="%f %f %f" % (0.5 * size_scaling * shrink,
                                           0.5 * size_scaling * shrink,
-                                           height / 2 * size_scaling * height_shrink),
+                                           _z),
                        type="box",
                        material="",
                        mass="0.001" if falling else "0.0002",
@ -232,13 +232,13 @@ class PointMazeEnv(gym.Env):
        self._view = np.zeros_like(self._view)

        def valid(row, col):
-            return self._view.shape[0] > row >= 0 and self._view.shape[1] > col >= 0
+            return self._view.shape[0] > row >= 0 \
+                and self._view.shape[1] > col >= 0

        def update_view(x, y, d, row=None, col=None):
            if row is None or col is None:
                x = x - self._robot_x
                y = y - self._robot_y
-                th = self._robot_ori

                row, col = self._xy_to_rowcol(x, y)
                update_view(x, y, d, row=row, col=col)
@ -252,36 +252,36 @@ class PointMazeEnv(gym.Env):

            if valid(row, col):
                self._view[row, col, d] += (
-                        (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
-                        (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
+                    (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
+                    (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
            if valid(row - 1, col):
                self._view[row - 1, col, d] += (
-                        (max(0., 0.5 - row_frac)) *
-                        (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
+                    (max(0., 0.5 - row_frac)) *
+                    (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
            if valid(row + 1, col):
                self._view[row + 1, col, d] += (
-                        (max(0., row_frac - 0.5)) *
-                        (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
+                    (max(0., row_frac - 0.5)) *
+                    (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
            if valid(row, col - 1):
                self._view[row, col - 1, d] += (
-                        (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
-                        (max(0., 0.5 - col_frac)))
+                    (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
+                    (max(0., 0.5 - col_frac)))
            if valid(row, col + 1):
                self._view[row, col + 1, d] += (
-                        (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
-                        (max(0., col_frac - 0.5)))
+                    (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
+                    (max(0., col_frac - 0.5)))
            if valid(row - 1, col - 1):
                self._view[row - 1, col - 1, d] += (
-                        (max(0., 0.5 - row_frac)) * max(0., 0.5 - col_frac))
+                    (max(0., 0.5 - row_frac)) * max(0., 0.5 - col_frac))
            if valid(row - 1, col + 1):
                self._view[row - 1, col + 1, d] += (
-                        (max(0., 0.5 - row_frac)) * max(0., col_frac - 0.5))
+                    (max(0., 0.5 - row_frac)) * max(0., col_frac - 0.5))
            if valid(row + 1, col + 1):
                self._view[row + 1, col + 1, d] += (
-                        (max(0., row_frac - 0.5)) * max(0., col_frac - 0.5))
+                    (max(0., row_frac - 0.5)) * max(0., col_frac - 0.5))
            if valid(row + 1, col - 1):
                self._view[row + 1, col - 1, d] += (
-                        (max(0., row_frac - 0.5)) * max(0., 0.5 - col_frac))
+                    (max(0., row_frac - 0.5)) * max(0., 0.5 - col_frac))

        # Draw ant.
        robot_x, robot_y = self.wrapped_env.get_body_com("torso")[:2]
@ -291,7 +291,6 @@ class PointMazeEnv(gym.Env):

        structure = self.MAZE_STRUCTURE
        size_scaling = self.MAZE_SIZE_SCALING
-        height = self.MAZE_HEIGHT

        # Draw immovable blocks and chasms.
        for i in range(len(structure)):
@ -311,7 +310,9 @@ class PointMazeEnv(gym.Env):
            update_view(block_x, block_y, 2)

        import cv2
-        cv2.imshow('x.jpg', cv2.resize(np.uint8(self._view * 255), (512, 512), interpolation=cv2.INTER_CUBIC))
+        cv2.imshow('x.jpg', cv2.resize(
+            np.uint8(self._view * 255), (512, 512),
+            interpolation=cv2.INTER_CUBIC))
        cv2.waitKey(0)

        return self._view
@ -350,10 +351,11 @@ class PointMazeEnv(gym.Env):
                        ))

        for block_name, block_type in self.movable_blocks:
-            block_x, block_y, block_z = self.wrapped_env.get_body_com(block_name)[
-                                        :3]
+            block_x, block_y, block_z = \
+                self.wrapped_env.get_body_com(block_name)[:3]
            if (block_z + height * size_scaling / 2 >= robot_z and
-                    robot_z >= block_z - height * size_scaling / 2):  # Block in view.
+                    robot_z >= block_z - height * size_scaling / 2):
+                # Block in view.
                x1 = block_x - 0.5 * size_scaling
                x2 = block_x + 0.5 * size_scaling
                y1 = block_y - 0.5 * size_scaling
@ -373,8 +375,8 @@ class PointMazeEnv(gym.Env):
        # 3 for wall, drop-off, block
        sensor_readings = np.zeros((self._n_bins, 3))
        for ray_idx in range(self._n_bins):
-            ray_ori = (ori - self._sensor_span * 0.5 +
-                       (2 * ray_idx + 1.0) / (2 * self._n_bins) * self._sensor_span)
+            ray_ori = (ori - self._sensor_span * 0.5 + (
+                2 * ray_idx + 1.0) / (2 * self._n_bins) * self._sensor_span)
            ray_segments = []
            # Get all segments that intersect with ray.
            for seg in segments:
@ -400,17 +402,14 @@ class PointMazeEnv(gym.Env):
                       None)
                if first_seg["distance"] <= self._sensor_range:
                    sensor_readings[ray_idx][idx] = (
-                                                            self._sensor_range - first_seg[
-                                                        "distance"]) / self._sensor_range
+                        self._sensor_range - first_seg[
+                            "distance"]) / self._sensor_range
        return sensor_readings

    def _get_obs(self):
        wrapped_obs = self.wrapped_env._get_obs()
-        # print("ant obs", wrapped_obs)
        if self._top_down_view:
-            view = [self.get_top_down_view().flat]
-        else:
-            view = []
+            self.get_top_down_view()

        if self._observe_blocks:
            additional_obs = []
@ -420,7 +419,7 @@ class PointMazeEnv(gym.Env):
            wrapped_obs = np.concatenate([wrapped_obs[:3]] + additional_obs +
                                         [wrapped_obs[3:]])

-        range_sensor_obs = self.get_range_sensor_obs()
+        self.get_range_sensor_obs()
        return wrapped_obs

    def seed(self, seed=None):
@ -446,7 +445,8 @@ class PointMazeEnv(gym.Env):
                    pos="%f %f %f" % (goal_x,
                                      goal_y,
                                      self.MAZE_HEIGHT / 2 * size_scaling),
-                    size="%f %f %f" % (0.1 * size_scaling,  # smaller than the block to prevent collision
+                    # smaller than the block to prevent collision
+                    size="%f %f %f" % (0.1 * size_scaling,
                                       0.1 * size_scaling,
                                       self.MAZE_HEIGHT / 2 * size_scaling),
                    type="box",
@ -455,7 +455,8 @@ class PointMazeEnv(gym.Env):
                    conaffinity="1",
                    rgba="1.0 0.0 0.0 0.5"
                )
-            # Note: running the lines below will make the robot position wrong! (because the graph is rebuilt)
+            # Note: running the lines below will make the robot position wrong!
+            # (because the graph is rebuilt)
            torso = self.tree.find(".//body[@name='torso']")
            geoms = torso.findall(".//geom")
            for geom in geoms:
@ -463,18 +464,21 @@ class PointMazeEnv(gym.Env):
                    raise Exception("Every geom of the torso must have a name "
                                    "defined")
            _, file_path = tempfile.mkstemp(text=True, suffix='.xml')
-            self.tree.write(
-                file_path)  # here we write a temporal file with the robot specifications. Why not the original one??
+            self.tree.write(file_path)
+            # here we write a temporal file with the robot specifications.
+            # Why not the original one??

            model_cls = self.__class__.MODEL_CLASS
-            self.wrapped_env = model_cls(*self.args, file_path=file_path,
-                                         **self.kwargs)  # file to the robot specifications; model_cls is AntEnv
+            # file to the robot specifications; model_cls is AntEnv
+            self.wrapped_env = model_cls(
+                *self.args, file_path=file_path, **self.kwargs)

        self.t = 0
        self.trajectory = []
        self.wrapped_env.reset()
        if len(self._init_positions) > 1:
-            xy = self._init_positions[self.np_random.randint(len(self._init_positions))]
+            xy = self._init_positions[self.np_random.randint(
+                len(self._init_positions))]
            self.wrapped_env.set_xy(xy)
        return self._get_obs()

@ -518,38 +522,38 @@ class PointMazeEnv(gym.Env):
    def _is_in_collision(self, pos):
        x, y = pos
        structure = self.MAZE_STRUCTURE
-        size_scaling = self.MAZE_SIZE_SCALING
+        scale = self.MAZE_SIZE_SCALING
        for i in range(len(structure)):
            for j in range(len(structure[0])):
                if structure[i][j] == 1:
-                    minx = j * size_scaling - size_scaling * 0.5 - self._init_torso_x
-                    maxx = j * size_scaling + size_scaling * 0.5 - self._init_torso_x
-                    miny = i * size_scaling - size_scaling * 0.5 - self._init_torso_y
-                    maxy = i * size_scaling + size_scaling * 0.5 - self._init_torso_y
+                    minx = j * scale - scale * 0.5 - self._init_torso_x
+                    maxx = j * scale + scale * 0.5 - self._init_torso_x
+                    miny = i * scale - scale * 0.5 - self._init_torso_y
+                    maxy = i * scale + scale * 0.5 - self._init_torso_y
                    if minx <= x <= maxx and miny <= y <= maxy:
                        return True
        return False

    def _rowcol_to_xy(self, j, i):
-        size_scaling = self.MAZE_SIZE_SCALING
-        minx = j * size_scaling - size_scaling * 0.5 - self._init_torso_x
-        maxx = j * size_scaling + size_scaling * 0.5 - self._init_torso_x
-        miny = i * size_scaling - size_scaling * 0.5 - self._init_torso_y
-        maxy = i * size_scaling + size_scaling * 0.5 - self._init_torso_y
+        scale = self.MAZE_SIZE_SCALING
+        minx = j * scale - scale * 0.5 - self._init_torso_x
+        maxx = j * scale + scale * 0.5 - self._init_torso_x
+        miny = i * scale - scale * 0.5 - self._init_torso_y
+        maxy = i * scale + scale * 0.5 - self._init_torso_y
        return (minx + maxx) / 2, (miny + maxy) / 2

    def step(self, action):
        self.t += 1
        if self._manual_collision:
            old_pos = self.wrapped_env.get_xy()
-            inner_next_obs, inner_reward, inner_done, info = self.wrapped_env.step(
-                action)
+            inner_next_obs, inner_reward, inner_done, info = \
+                self.wrapped_env.step(action)
            new_pos = self.wrapped_env.get_xy()
            if self._is_in_collision(new_pos):
                self.wrapped_env.set_xy(old_pos)
        else:
-            inner_next_obs, inner_reward, inner_done, info = self.wrapped_env.step(
-                action)
+            inner_next_obs, inner_reward, inner_done, info = \
+                self.wrapped_env.step(action)
        next_obs = self._get_obs()
        done = False
        if self.goal is not None: