The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
35 lines
768 B
Python
35 lines
768 B
Python
"""Trainer package."""
|
|
|
|
from tianshou.trainer.base import BaseTrainer
|
|
from tianshou.trainer.offline import (
|
|
OfflineTrainer,
|
|
offline_trainer,
|
|
offline_trainer_iter,
|
|
)
|
|
from tianshou.trainer.offpolicy import (
|
|
OffpolicyTrainer,
|
|
offpolicy_trainer,
|
|
offpolicy_trainer_iter,
|
|
)
|
|
from tianshou.trainer.onpolicy import (
|
|
OnpolicyTrainer,
|
|
onpolicy_trainer,
|
|
onpolicy_trainer_iter,
|
|
)
|
|
from tianshou.trainer.utils import gather_info, test_episode
|
|
|
|
__all__ = [
|
|
"BaseTrainer",
|
|
"offpolicy_trainer",
|
|
"offpolicy_trainer_iter",
|
|
"OffpolicyTrainer",
|
|
"onpolicy_trainer",
|
|
"onpolicy_trainer_iter",
|
|
"OnpolicyTrainer",
|
|
"offline_trainer",
|
|
"offline_trainer_iter",
|
|
"OfflineTrainer",
|
|
"test_episode",
|
|
"gather_info",
|
|
]
|