This PR fixes #766 . Co-authored-by: Yi Su <yi_su@apple.com>
- implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;