The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
50 lines
842 B
ReStructuredText
50 lines
842 B
ReStructuredText
tianshou.trainer
|
|
================
|
|
|
|
|
|
On-policy
|
|
---------
|
|
|
|
.. autoclass:: tianshou.trainer.OnpolicyTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.onpolicy_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.onpolicy_trainer_iter
|
|
|
|
|
|
Off-policy
|
|
----------
|
|
|
|
.. autoclass:: tianshou.trainer.OffpolicyTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.offpolicy_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.offpolicy_trainer_iter
|
|
|
|
|
|
Offline
|
|
-------
|
|
|
|
.. autoclass:: tianshou.trainer.OfflineTrainer
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
|
|
.. autofunction:: tianshou.trainer.offline_trainer
|
|
|
|
.. autoclass:: tianshou.trainer.offline_trainer_iter
|
|
|
|
|
|
utils
|
|
-----
|
|
|
|
.. autofunction:: tianshou.trainer.test_episode
|
|
|
|
.. autofunction:: tianshou.trainer.gather_info
|