Tianshou/docs/notebooks/L3_Vectorized__Environment.ipynb
2023-12-04 11:42:38 +01:00

264 lines
7.6 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "W5V7z3fVX7_b"
},
"source": [
"# Vectorized Environment\n",
"In reinforcement learning, the agent interacts with environments to improve itself. In this tutorial we will concentrate on the environment part. Although there are many kinds of environments or their libraries in DRL research, Tianshou chooses to keep a consistent API with [OPENAI Gym](https://gym.openai.com/).\n",
"\n",
"<div align=center>\n",
"<img src=\"https://tianshou.readthedocs.io/en/master/_images/rl-loop.jpg\", title=\"The agents interacting with the environment\">\n",
"\n",
"<a> The agents interacting with the environment </a>\n",
"</div>\n",
"\n",
"In Gym, an environment receives an action and returns next observation and reward. This process is slow and sometimes can be the throughput bottleneck in a DRL experiment.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A0NGWZ8adBwt"
},
"source": [
"Tianshou provides vectorized environment wrapper for a Gym environment. This wrapper allows you to make use of multiple cpu cores in your server to accelerate the data sampling."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"# ToDo: Clarify if it has to be done, or truncated. Also in the function description of ´SubprocVectorEnv.step()´, output is not clear"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"editable": true,
"id": "67wKtkiNi3lb",
"outputId": "1e04353b-7a91-4c32-e2ae-f3889d58aa5e",
"slideshow": {
"slide_type": ""
},
"tags": [
"remove-output",
"hide-cell"
]
},
"outputs": [],
"source": [
"from tianshou.env import SubprocVectorEnv\n",
"import numpy as np\n",
"import gymnasium as gym\n",
"import time\n",
"from tianshou.env import DummyVectorEnv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_cpus = [1, 2, 5]\n",
"for num_cpu in num_cpus:\n",
" env = SubprocVectorEnv([lambda: gym.make(\"CartPole-v1\") for _ in range(num_cpu)])\n",
" env.reset()\n",
" sampled_steps = 0\n",
" time_start = time.time()\n",
" while sampled_steps < 1000:\n",
" act = np.random.choice(2, size=num_cpu)\n",
" obs, rew, terminated, truncated, info = env.step(act)\n",
" if np.sum(terminated):\n",
" env.reset(np.where(terminated)[0])\n",
" sampled_steps += num_cpu\n",
" time_used = time.time() - time_start\n",
" print(\"{}s used to sample 1000 steps if using {} cpus.\".format(time_used, num_cpu))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S1b6vxp9nEUS"
},
"source": [
"You may notice that the speed doesn't increase linearly when we add subprocess numbers. There are multiple reasons behind this. One reason is that synchronize exception causes straggler effect. One way to solve this would be to use asynchronous mode. We leave this for further reading if you feel interested.\n",
"\n",
"Note that SubprocVectorEnv should only be used when the environment exection is slow. In practice, DummyVectorEnv (or raw Gym environment) is actually more efficient for a simple environment like CartPole because now you avoid both straggler effect and the overhead of communication between subprocesses."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z6yPxdqFp18j"
},
"source": [
"## Usages\n",
"### Initialization\n",
"Just pass in a list of functions which return the initialized environment upon called."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ssLcrL_pq24-"
},
"outputs": [],
"source": [
"# In Gym\n",
"env = gym.make(\"CartPole-v1\")\n",
"\n",
"\n",
"# In Tianshou\n",
"def helper_function():\n",
" env = gym.make(\"CartPole-v1\")\n",
" # other operations such as env.seed(np.random.choice(10))\n",
" return env\n",
"\n",
"\n",
"envs = DummyVectorEnv([helper_function for _ in range(5)])\n",
"print(envs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "X7p8csjdrwIN"
},
"source": [
"### EnvPool supporting\n",
"Besides integrated environment wrappers, Tianshou also fully supports [EnvPool](https://github.com/sail-sg/envpool/). Explore its Github page yourself."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kvIfqh0vqAR5"
},
"source": [
"### Environment exection and resetting\n",
"The only difference between Vectorized environments and standard Gym environments is that passed in actions and returned rewards/observations are also vectorized."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"# ToDo: num_cpu is defined for a for loop, not as a parameter..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BH1ZnPG6tkdD"
},
"outputs": [],
"source": [
"# In Gym, env.reset() returns a single observation.\n",
"print(\"In Gym, env.reset() returns a single observation.\")\n",
"print(env.reset())\n",
"\n",
"# In Tianshou, envs.reset() returns stacked observations.\n",
"print(\"========================================\")\n",
"print(\"In Tianshou, envs.reset() returns stacked observations.\")\n",
"print(envs.reset())\n",
"\n",
"obs, rew, done, truncated, info = envs.step(np.random.choice(2, size=num_cpu))\n",
"print(info)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qXroB7KluvP9"
},
"source": [
"If we only want to execute several environments. The `id` argument can be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ufvFViKTu8d_"
},
"outputs": [],
"source": [
"print(envs.step(np.random.choice(2, size=3), id=[0, 3, 1]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fekHR1a6X_HB"
},
"source": [
"## Further Reading\n",
"### Other environment wrappers in Tianshou\n",
"\n",
"\n",
"* ShmemVectorEnv: use share memory instead of pipe based on SubprocVectorEnv;\n",
"* RayVectorEnv: use Ray for concurrent activities and is currently the only choice for parallel simulation in a cluster with multiple machines.\n",
"\n",
"Check the [documentation](https://tianshou.readthedocs.io/en/master/api/tianshou.env.html) for details.\n",
"\n",
"### Difference between synchronous and asynchronous mode (How to choose?)\n",
"Explanation can be found at the [Parallel Sampling](https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#parallel-sampling) tutorial."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}