3 Commits

Author SHA1 Message Date
Anas BELFADIL
d976a5aa91
Fixed hardcoded reward_treshold (#548) 2022-03-04 10:35:39 +08:00
Jiayi Weng
3d697aa4c6
make unit test faster (#522)
* test cache expert data in offline training

* faster cql test

* faster tests

* use dummy

* test ray dependency
2022-02-09 00:24:52 +08:00
Yi Su
3592f45446
Fix critic network for Discrete CRR (#485)
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
2021-11-28 23:10:28 +08:00