Erni 1aee41fa9c
Using dist.mode instead of logits.argmax (#1066)
changed all the occurrences where an action is selected deterministically

- **from**: using the outputs of the actor network.
- **to**: using the mode of the PyTorch distribution.

---------

Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com>
2024-03-03 00:09:39 +01:00
..
2020-03-21 10:58:01 +08:00