- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
This PR implements BCQPolicy, which could be used to train an offline agent in the environment of continuous action space. An experimental result 'halfcheetah-expert-v1' is provided, which is a d4rl environment (for Offline Reinforcement Learning).
Example usage is in the examples/offline/offline_bcq.py.
Change the behavior of to_numpy and to_torch: from now on, dict is automatically converted to Batch and list is automatically converted to np.ndarray (if an error occurs, raise the exception instead of converting each element in the list).
This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix#245. You can check #274 for more detail.
Things changed in this PR:
1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv;
2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.)
3. add policy.exploration_noise(act, batch) -> act
4. small change in BasePolicy.compute_*_returns
5. move reward_metric from collector to trainer
6. fix np.asanyarray issue (different version's numpy will result in different output)
7. flake8 maxlength=88
8. polish docs and fix test
Co-authored-by: n+e <trinkle23897@gmail.com>
This is the PR for QR-DQN algorithm: https://arxiv.org/abs/1710.10044
1. add QR-DQN policy in tianshou/policy/modelfree/qrdqn.py.
2. add QR-DQN net in examples/atari/atari_network.py.
3. add QR-DQN atari example in examples/atari/atari_qrdqn.py.
4. add QR-DQN statement in tianshou/policy/init.py.
5. add QR-DQN unit test in test/discrete/test_qrdqn.py.
6. add QR-DQN atari results in examples/atari/results/qrdqn/.
7. add compute_q_value in DQNPolicy and C51Policy for simplify forward function.
8. move `with torch.no_grad():` from `_target_q` to BasePolicy
By running "python3 atari_qrdqn.py --task "PongNoFrameskip-v4" --batch-size 64", get best_result': '19.8 ± 0.40', in epoch 8.
Cherry-pick from #200
- update the function signature
- format code-style
- move _compile into separate functions
- fix a bug in to_torch and to_numpy (Batch)
- remove None in action_range
In short, the code-format only contains function-signature style and `'` -> `"`. (pick up from [black](https://github.com/psf/black))
- fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy)
- polish examples/box2d/bipedal_hardcore_sac.py
- several docs update
- format setup.py and bump version to 0.2.7
* code refactor; remove unused kwargs; add reward_normalization for dqn
* bugfix for __setitem__ with torch.Tensor; add Batch.condense
* minor fix
* support cat with empty Batch
* remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases
* support stack with empty Batch
* remove condense
* refactor code to reflect the shared / partial / reserved categories of keys
* add is_empty(recursive=False)
* doc fix
* docfix and bugfix for _is_batch_set
* add doc for key reservation
* bugfix for algebra operators
* fix cat with lens hint
* code refactor
* bugfix for storing None
* use ValueError instead of exception
* hide lens away from users
* add comment for __cat
* move the computation of the initial value of lens in cat_ itself.
* change the place of doc string
* doc fix for Batch doc string
* change recursive to recurse
* doc string fix
* minor fix for batch doc
* remove dummy net; delete two files
* split code to have backbone and head
* rename class
* change torch.float to torch.float32
* use flatten(1) instead of view(batch, -1)
* remove dummy net in docs
* bugfix for rnn
* fix cuda error
* minor fix of docs
* do not change the example code in dqn tutorial, since it is for demonstration
Co-authored-by: Trinkle23897 <463003665@qq.com>
* Enable to convert Batch data back to torch.
* Add torch converter to collector.
* Fix
* Move to_numpy/to_torch convert in dedicated utils.py.
* Use to_numpy/to_torch to convert arrays.
* fix lint
* fix
* Add unit test to check Batch from/to numpy.
* Fix Batch over Batch.
Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>