The new proposed feature is to have trainers as generators.
The usage pattern is:
```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
print(f"Epoch: {epoch}")
print(epoch_stat)
print(info)
do_something_with_policy()
query_something_about_policy()
make_a_plot_with(epoch_stat)
display(info)
```
- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer
You can even iterate on several different trainers at the same time:
```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
compare_results(result1, result2, ...)
```
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
50 lines
842 B
ReStructuredText
50 lines
842 B
ReStructuredText
tianshou.trainer
|
|
================
|
|
|
|
|
|
On-policy
|
|
---------
|
|
|
|
.. autoclass:: tianshou.trainer.OnpolicyTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.onpolicy_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.onpolicy_trainer_iter
|
|
|
|
|
|
Off-policy
|
|
----------
|
|
|
|
.. autoclass:: tianshou.trainer.OffpolicyTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.offpolicy_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.offpolicy_trainer_iter
|
|
|
|
|
|
Offline
|
|
-------
|
|
|
|
.. autoclass:: tianshou.trainer.OfflineTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.offline_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.offline_trainer_iter
|
|
|
|
|
|
utils
|
|
-----
|
|
|
|
.. autofunction:: tianshou.trainer.test_episode
|
|
|
|
.. autofunction:: tianshou.trainer.gather_info
|