Tianshou/docs/02_notebooks/L3_Vectorized__Environment.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "W5V7z3fVX7_b"
   },
   "source": [
    "# Vectorized Environment\n",
    "In reinforcement learning, the agent interacts with environments to improve itself. In this tutorial we will concentrate on the environment part. Although there are many kinds of environments or their libraries in DRL research, Tianshou chooses to keep a consistent API with [OPENAI Gym](https://gym.openai.com/).\n",
    "\n",
    "<div align=center>\n",
    "<img src=\"https://tianshou.readthedocs.io/en/master/_images/rl-loop.jpg\", title=\"The agents interacting with the environment\">\n",
    "\n",
    "<a> The agents interacting with the environment </a>\n",
    "</div>\n",
    "\n",
    "In Gym, an environment receives an action and returns next observation and reward. This process is slow and sometimes can be the throughput bottleneck in a DRL experiment.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "A0NGWZ8adBwt"
   },
   "source": [
    "Tianshou provides vectorized environment wrapper for a Gym environment. This wrapper allows you to make use of multiple cpu cores in your server to accelerate the data sampling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "editable": true,
    "id": "67wKtkiNi3lb",
    "outputId": "1e04353b-7a91-4c32-e2ae-f3889d58aa5e",
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-output",
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "\n",
    "import time\n",
    "\n",
    "import gymnasium as gym\n",
    "import numpy as np\n",
    "\n",
    "from tianshou.env import DummyVectorEnv, SubprocVectorEnv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cpus = [1, 2, 5]\n",
    "for num_cpu in num_cpus:\n",
    "    env = SubprocVectorEnv([lambda: gym.make(\"CartPole-v1\") for _ in range(num_cpu)])\n",
    "    env.reset()\n",
    "    sampled_steps = 0\n",
    "    time_start = time.time()\n",
    "    while sampled_steps < 1000:\n",
    "        act = np.random.choice(2, size=num_cpu)\n",
    "        obs, rew, terminated, truncated, info = env.step(act)\n",
    "        if np.sum(terminated):\n",
    "            env.reset(np.where(terminated)[0])\n",
    "        sampled_steps += num_cpu\n",
    "    time_used = time.time() - time_start\n",
    "    print(f\"{time_used}s used to sample 1000 steps if using {num_cpu} cpus.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "S1b6vxp9nEUS"
   },
   "source": [
    "You may notice that the speed doesn't increase linearly when we add subprocess numbers. There are multiple reasons behind this. One reason is that synchronize exception causes straggler effect. One way to solve this would be to use asynchronous mode. We leave this for further reading if you feel interested.\n",
    "\n",
    "Note that SubprocVectorEnv should only be used when the environment execution is slow. In practice, DummyVectorEnv (or raw Gym environment) is actually more efficient for a simple environment like CartPole because now you avoid both straggler effect and the overhead of communication between subprocesses."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z6yPxdqFp18j"
   },
   "source": [
    "## Usages\n",
    "### Initialization\n",
    "Just pass in a list of functions which return the initialized environment upon called."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ssLcrL_pq24-"
   },
   "outputs": [],
   "source": [
    "# In Gym\n",
    "gym_env = gym.make(\"CartPole-v1\")\n",
    "\n",
    "\n",
    "# In Tianshou\n",
    "def create_cartpole_env() -> gym.Env:\n",
    "    return gym.make(\"CartPole-v1\")\n",
    "\n",
    "\n",
    "# We can distribute the environments on the available cpus, which we assume to be 5 in this case\n",
    "vector_env = DummyVectorEnv([create_cartpole_env for _ in range(5)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "X7p8csjdrwIN"
   },
   "source": [
    "### EnvPool supporting\n",
    "Besides integrated environment wrappers, Tianshou also fully supports [EnvPool](https://github.com/sail-sg/envpool/). Explore its Github page yourself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kvIfqh0vqAR5"
   },
   "source": [
    "### Environment execution and resetting\n",
    "The only difference between Vectorized environments and standard Gym environments is that passed in actions and returned rewards/observations are also vectorized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BH1ZnPG6tkdD"
   },
   "outputs": [],
   "source": [
    "# In gymnasium, env.reset() returns an observation, info tuple\n",
    "print(\"In Gym, env.reset() returns a single observation.\")\n",
    "print(gym_env.reset())\n",
    "\n",
    "# In Tianshou, envs.reset() returns stacked observations.\n",
    "print(\"========================================\")\n",
    "print(\"In Tianshou, a VectorEnv's reset() returns stacked observations.\")\n",
    "print(vector_env.reset())\n",
    "\n",
    "info = vector_env.step(np.random.choice(2, size=vector_env.env_num))[4]\n",
    "print(info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qXroB7KluvP9"
   },
   "source": [
    "If we only want to execute several environments. The `id` argument can be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ufvFViKTu8d_"
   },
   "outputs": [],
   "source": [
    "info = vector_env.step(np.random.choice(2, size=3), id=[0, 3, 1])[4]\n",
    "print(info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fekHR1a6X_HB"
   },
   "source": [
    "## Further Reading\n",
    "### Other environment wrappers in Tianshou\n",
    "\n",
    "\n",
    "*   ShmemVectorEnv: use share memory instead of pipe based on SubprocVectorEnv;\n",
    "*   RayVectorEnv: use Ray for concurrent activities and is currently the only choice for parallel simulation in a cluster with multiple machines.\n",
    "\n",
    "Check the [documentation](https://tianshou.readthedocs.io/en/master/api/tianshou.env.html) for details.\n",
    "\n",
    "### Difference between synchronous and asynchronous mode (How to choose?)\n",
    "Explanation can be found at the [Parallel Sampling](https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#parallel-sampling) tutorial."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}