The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
13 lines
665 B
Markdown
13 lines
665 B
Markdown
- [ ] I have marked all applicable categories:
|
|
+ [ ] exception-raising bug
|
|
+ [ ] RL algorithm bug
|
|
+ [ ] documentation request (i.e. "X is missing from the documentation.")
|
|
+ [ ] new feature request
|
|
- [ ] I have visited the [source website](https://github.com/thu-ml/tianshou/)
|
|
- [ ] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
|
|
- [ ] I have mentioned version numbers, operating system and environment, where applicable:
|
|
```python
|
|
import tianshou, gym, torch, numpy, sys
|
|
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
|
|
```
|