Tianshou

History

Add map_action_inverse for fixing error of storing random action (#568 )

(Issue #512) Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when the action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods.

This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.

2022-03-12 22:26:00 +08:00

imitation

Implement Generative Adversarial Imitation Learning (GAIL) (#550 )

2022-03-06 23:57:15 +08:00

modelbased

Formalize variable names (#509 )

2022-01-30 00:53:56 +08:00

modelfree

Add a comment before SAC alpha loss (#565 )

2022-03-09 06:38:42 +08:00

multiagent

fix conda support and keep API compatibility (#536 )