diff --git a/docs/_static/images/policy_table.png b/docs/_static/images/policy_table.png deleted file mode 100644 index ccd3b35..0000000 Binary files a/docs/_static/images/policy_table.png and /dev/null differ diff --git a/docs/_static/images/policy_table.svg b/docs/_static/images/policy_table.svg new file mode 100644 index 0000000..7abf43b Binary files /dev/null and b/docs/_static/images/policy_table.svg differ diff --git a/docs/_static/images/pseudocode_off_policy.png b/docs/_static/images/pseudocode_off_policy.png deleted file mode 100644 index c07aecd..0000000 Binary files a/docs/_static/images/pseudocode_off_policy.png and /dev/null differ diff --git a/docs/_static/images/pseudocode_off_policy.svg b/docs/_static/images/pseudocode_off_policy.svg new file mode 100644 index 0000000..c16037a Binary files /dev/null and b/docs/_static/images/pseudocode_off_policy.svg differ diff --git a/docs/_static/images/structure.png b/docs/_static/images/structure.png deleted file mode 100644 index 867cc4d..0000000 Binary files a/docs/_static/images/structure.png and /dev/null differ diff --git a/docs/_static/images/structure.svg b/docs/_static/images/structure.svg new file mode 100644 index 0000000..2119ec1 --- /dev/null +++ b/docs/_static/images/structure.svg @@ -0,0 +1,3 @@ + + +
Agent
Agent
VecBuffer
VecBuffer
Buf 1
Buf 1
Buf 2
Buf 2
Buf 3
Buf 3
···
···
Buf n
Buf n
···
···
···
···
env.step()
env.step()
batch
batch
buffer.add()
buffer.add()
batch
batch
buffer.sample()
buffer.sample...
Pytorch Module
Pytorch Module
policy.process_fn()
policy.proces...
policy.learn()
policy.learn()
VecEnv
VecEnv
Env 1
Env 1
Env 2
Env 2
Env 3
Env 3
···
···
Env n
Env n
···
···
···
···
batch
batch
policy.forward()
policy.forwar...
observations
observations
env.step()
env.step()
collector.collect()
collector.col...
actions
actions
collector.collect()
collector.col...
Collector
Collector
Policy
Policy
Logger
Logger
Trainer
Trainer
statistics
statistics
statistics
statistics
policy.update()
policy.update...
split data
split data
concatenate data
concatenate data
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/docs/_static/images/timelimit.png b/docs/_static/images/timelimit.png deleted file mode 100644 index 798150a..0000000 Binary files a/docs/_static/images/timelimit.png and /dev/null differ diff --git a/docs/_static/images/timelimit.svg b/docs/_static/images/timelimit.svg new file mode 100644 index 0000000..b35ddd6 --- /dev/null +++ b/docs/_static/images/timelimit.svg @@ -0,0 +1,3 @@ + + +
End because of time limit
End because of time limit
End because enough timesteps collected
End because enough timesteps collected
End normally
End normally
Data batch
Data batch
Env n
Env n
b_{h}
b_{1}
b_{h+1}
b_{nh}
links all data segments sequentially
links all data segments sequentially
buffer.sample(0)
buffer.sample(0)
Env 1
Env 1
Env 2
Env 2
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/docs/notebooks/L4_Policy.ipynb b/docs/notebooks/L4_Policy.ipynb index 4d356e4..4a815d0 100644 --- a/docs/notebooks/L4_Policy.ipynb +++ b/docs/notebooks/L4_Policy.ipynb @@ -825,10 +825,10 @@ }, "source": [ "
\n", - "\n", + "\n", "
\n", "
\n", - "\n", + "\n", "
" ] } diff --git a/docs/notebooks/L5_Collector.ipynb b/docs/notebooks/L5_Collector.ipynb index 38dc40d..0dbb77d 100644 --- a/docs/notebooks/L5_Collector.ipynb +++ b/docs/notebooks/L5_Collector.ipynb @@ -10,7 +10,7 @@ "From its literal meaning, we can easily know that the Collector in Tianshou is used to collect training data. More specifically, the Collector controls the interaction between Policy (agent) and the environment. It also helps save the interaction data into the ReplayBuffer and returns episode statistics.\n", "\n", "
\n", - "\n", + "\n", "
\n", "\n" ] diff --git a/docs/notebooks/L6_Trainer.ipynb b/docs/notebooks/L6_Trainer.ipynb index f81c6ff..308e35a 100644 --- a/docs/notebooks/L6_Trainer.ipynb +++ b/docs/notebooks/L6_Trainer.ipynb @@ -10,7 +10,7 @@ "Trainer is the highest-level encapsulation in Tianshou. It controls the training loop and the evaluation method. It also controls the interaction between the Collector and the Policy, with the ReplayBuffer serving as the media.\n", "\n", "
\n", - "\n", + "\n", "
\n", "\n", "\n" @@ -34,7 +34,7 @@ "source": [ "### Pseudocode\n", "
\n", - "\n", + "\n", "
\n", "\n", "For the on-policy trainer, the main difference is that we clear the buffer after Line 10."