The new proposed feature is to have trainers as generators.
The usage pattern is:
```python
trainer = OnPolicyTrainer(...)
for epoch, epoch_stat, info in trainer:
print(f"Epoch: {epoch}")
print(epoch_stat)
print(info)
do_something_with_policy()
query_something_about_policy()
make_a_plot_with(epoch_stat)
display(info)
```
- epoch int: the epoch number
- epoch_stat dict: a large collection of metrics of the current epoch, including stat
- info dict: the usual dict out of the non-generator version of the trainer
You can even iterate on several different trainers at the same time:
```python
trainer1 = OnPolicyTrainer(...)
trainer2 = OnPolicyTrainer(...)
for result1, result2, ... in zip(trainer1, trainer2, ...):
compare_results(result1, result2, ...)
```
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix#245. You can check #274 for more detail.
Things changed in this PR:
1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv;
2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.)
3. add policy.exploration_noise(act, batch) -> act
4. small change in BasePolicy.compute_*_returns
5. move reward_metric from collector to trainer
6. fix np.asanyarray issue (different version's numpy will result in different output)
7. flake8 maxlength=88
8. polish docs and fix test
Co-authored-by: n+e <trinkle23897@gmail.com>
- Refacor code to remove duplicate code
- Enable async simulation for all vector envs
- Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv`
The abstraction of vector env changed.
Prior to this pr, each vector env is almost independent.
After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility.
Co-authored-by: n+e <463003665@qq.com>
Co-authored-by: magicly <magicly007@gmail.com>
* remove dummy net; delete two files
* split code to have backbone and head
* rename class
* change torch.float to torch.float32
* use flatten(1) instead of view(batch, -1)
* remove dummy net in docs
* bugfix for rnn
* fix cuda error
* minor fix of docs
* do not change the example code in dqn tutorial, since it is for demonstration
Co-authored-by: Trinkle23897 <463003665@qq.com>