6 Commits

Author SHA1 Message Date
Wenhao Chen
f270e88461
Do not allow async simulation for test collector (#705) 2022-07-22 16:23:55 -07:00
Yi Su
df35718992
Implement TD3+BC for offline RL (#660)
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
2022-06-07 00:39:37 +08:00
Jiayi Weng
5ecea2402e
Fix save_checkpoint_fn return value (#659)
- Fix save_checkpoint_fn return value to checkpoint_path;
- Fix wrong link in doc;
- Fix an off-by-one bug in trainer iterator.
2022-06-03 01:07:07 +08:00
Michal Gregor
c87b9f49bc
Add show_progress option for trainer (#641)
- A DummyTqdm class added to utils: it replicates the interface used by trainers, but does not show the progress bar;
- Added a show_progress argument to the base trainer: when show_progress == True, dummy_tqdm is used in place of tqdm.
2022-05-17 23:41:59 +08:00
Jiayi Weng
2a9c9289e5
rename save_fn to save_best_fn to avoid ambiguity (#575)
This PR also introduces `tianshou.utils.deprecation` for a unified deprecation wrapper.
2022-03-22 04:29:27 +08:00
Jose Antonio Martin H
10d919052b
Add Trainers as generators (#559)
The new proposed feature is to have trainers as generators.
The usage pattern is:

```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
    print(f"Epoch: {epoch}")
    print(epoch_stat)
    print(info)
    do_something_with_policy()
    query_something_about_policy()
    make_a_plot_with(epoch_stat)
    display(info)
```

- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer

You can even iterate on several different trainers at the same time:

```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
    compare_results(result1, result2, ...)
```

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
2022-03-18 00:26:14 +08:00