ChenDRAG e27b5a26f3
Refactor PG algorithm and change behavior of compute_episodic_return (#319)
- simplify code
- apply value normalization (global) and adv norm (per-batch) in on-policy algorithms
2021-03-23 22:05:48 +08:00
..
2020-03-21 10:58:01 +08:00
2021-02-19 10:33:49 +08:00
2020-10-04 21:55:43 +08:00