Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).
Cherry-pick from #200
- update the function signature
- format code-style
- move _compile into separate functions
- fix a bug in to_torch and to_numpy (Batch)
- remove None in action_range
In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))
1. add policy.eval() in all test scripts' "watch performance"
2. remove dict return support for collector preprocess_fn
3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)`
4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
6. add test_returns (both GAE and nstep)
7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
8. fix shape inconsistency for torch.Tensor in replay buffer
9. remove `**kwargs` in ReplayBuffer
10. remove default value in batch.split() and add merge_last argument (#185)
11. improve nstep efficiency
12. add max_batchsize in onpolicy algorithms
13. potential bugfix for subproc.wait
14. fix RecurrentActorProb
15. improve the code-coverage (from 90% to 95%) and remove the dead code
16. fix some incorrect type annotation
The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
- Refacor code to remove duplicate code
- Enable async simulation for all vector envs
- Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv`
The abstraction of vector env changed.
Prior to this pr, each vector env is almost independent.
After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility.
Co-authored-by: n+e <463003665@qq.com>
Co-authored-by: magicly <magicly007@gmail.com>
Add class BaseNoise and GaussianNoise for the concept of exploration noise.
Add new test for sac tested in MountainCarContinuous-v0,
which should benefits from the two above new feature.
* update atari.py
* fix setup.py
pass the pytest
* fix setup.py
pass the pytest
* add args "render"
* change the tensorboard writter
* change the tensorboard writter
* change device, render, tensorboard log location
* change device, render, tensorboard log location
* remove some wrong local files
* fix some tab mistakes and the envs name in continuous/test_xx.py
* add examples and point robot maze environment
* fix some bugs during testing examples
* add dqn network and fix some args
* change back the tensorboard writter's frequency to ensure ppo and a2c can write things normally
* add a warning to collector
* rm some unrelated files
* reformat
* fix a bug in test_dqn due to the model wrong selection