"Replay Buffer is a very common module in DRL implementations. In Tianshou, you can consider Buffer module as as a specialized form of Batch, which helps you track all data trajectories and provide utilities such as sampling method besides the basic storage.\n",
"\n",
"There are many kinds of Buffer modules in Tianshou, two most basic ones are ReplayBuffer and VectorReplayBuffer. The later one is specially designed for parallelized environments (will introduce in tutorial L3)."
],
"metadata": {
"id": "xoPiGVD8LNma"
}
},
{
"cell_type": "markdown",
"source": [
"# Usages"
],
"metadata": {
"id": "OdesCAxANehZ"
}
},
{
"cell_type": "markdown",
"source": [
"## Basic usages as a batch\n",
"Usually a buffer stores all the data in a batch with circular-queue style."
"ReplayBuffer can also be saved into local disk, still keeping track of the trajectories. This is extremely helpful in offline DRL settings."
],
"metadata": {
"id": "vqldap-2WQBh"
}
},
{
"cell_type": "code",
"source": [
"import pickle\n",
"_buf = pickle.loads(pickle.dumps(buf))"
],
"metadata": {
"id": "Ppx0L3niNT5K"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Understanding reserved keys for buffer\n",
"As I have explained, ReplayBuffer is specially designed to utilize the implementations of DRL algorithms. So, for convenience, we reserve certain seven reserved keys in Batch.\n",
"\n",
"* `obs`\n",
"* `act`\n",
"* `rew`\n",
"* `done`\n",
"* `obs_next`\n",
"* `info`\n",
"* `policy`\n",
"\n",
"The meaning of these seven reserved keys are consistent with the meaning in [OPENAI Gym](https://gym.openai.com/). We would recommend you simply use these seven keys when adding batched data into ReplayBuffer, because\n",
"some of them are tracked in ReplayBuffer (e.g. \"done\" value is tracked to help us determine a trajectory's start index and end index, together with its total reward and episode length.)\n",
"\n",
"```\n",
"buf.add(Batch(......, extro_info=0)) # This is okay but not recommended.\n",
"We keep a replay buffer in DRL for one purpose:\"sample data from it for training\". `ReplayBuffer.sample()` and `ReplayBuffer.split(..., shuffle=True)` can both fullfill this need."
"Notice that `ReplayBuffer.add()` returns a tuple of 4 numbers every time it returns, meaning `(current_index, episode_reward, episode_length, episode_start_index)`. `episode_reward` and `episode_length` are valid only when a trajectory is finished. This might save developers some trouble.\n",
"\n"
],
"metadata": {
"id": "dO7PWdb_hkXA"
}
},
{
"cell_type": "markdown",
"source": [
"### Episode index management\n",
"In the ReplayBuffer above, we can get access to any data step by indexing.\n"
"Now we know that step \"6\" is not the start of an episode (it should be step 4, 4-7 is the second trajectory we add into the ReplayBuffer), we wonder what is the earliest index of the that episode.\n",
"\n",
"This may seem easy but actually it is not. We cannot simply look at the \"done\" flag now, because we can see that since the third-added trajectory is not finished yet, step \"4\" is surrounded by flag \"False\". There are many things to consider. Things could get more nasty if you are using more advanced ReplayBuffer like VectorReplayBuffer, because now the data is not stored in a simple circular-queue.\n",
"\n",
"Luckily, all ReplayBuffer instances help you identify step indexes through a unified API."
],
"metadata": {
"id": "p5Co_Fmzj8Sw"
}
},
{
"cell_type": "code",
"source": [
"# Search for the previous index of index \"6\"\n",
"Using `ReplayBuffer.prev()`, we know that the earliest step of that episode is step \"3\". Similarly, `ReplayBuffer.next()` helps us indentify the last index of an episode regardless of which kind of ReplayBuffer we are using."
"Aforementioned APIs will be helpful when we calculate quantities like GAE and n-step-returns in DRL algorithms ([Example usage in Tianshou](https://github.com/thu-ml/tianshou/blob/6fc68578127387522424460790cbcb32a2bd43c4/tianshou/policy/base.py#L384)). The unified APIs ensure a modular design and a flexible interface."
],
"metadata": {
"id": "8_lMr0j3pOmn"
}
},
{
"cell_type": "markdown",
"source": [
"# Further Reading\n",
"## Other Buffer Module\n",
"\n",
"* PrioritizedReplayBuffer, which helps you implement [prioritized experience replay](https://arxiv.org/abs/1511.05952)\n",
"* CachedReplayBuffer, one main buffer with several cached buffers (higher sample efficiency in some scenarios)\n",
"* ReplayBufferManager, A base class that can be inherited (may help you manage multiple buffers).\n",
"\n",
"Check the documentation and the source code for more details.\n",
"\n",
"## Support for steps stacking to use RNN in DRL.\n",
"There is an option called `stack_num` (default to 1) when initialising the ReplayBuffer, which may help you use RNN in your algorithm. Check the documentation for details."