This is the first commit of 6 commits mentioned in #274, which features 1. Refactor of `Class Net` to support any form of MLP. 2. Enable type check in utils.network. 3. Relative change in docs/test/examples. 4. Move atari-related network to examples/atari/atari_network.py Co-authored-by: Trinkle23897 <trinkle23897@gmail.com>
Mujoco Result
SAC (single run)
The best reward computes from 100 episodes returns in the test phase.
SAC on Swimmer-v3 always stops at 47~48.
| task | 3M best reward | parameters | time cost (3M) | 
|---|---|---|---|
| HalfCheetah-v3 | 10157.70 ± 171.70 | python3 mujoco_sac.py --task HalfCheetah-v3 | 2~3h | 
| Walker2d-v3 | 5143.04 ± 15.57 | python3 mujoco_sac.py --task Walker2d-v3 | 2~3h | 
| Hopper-v3 | 3604.19 ± 169.55 | python3 mujoco_sac.py --task Hopper-v3 | 2~3h | 
| Humanoid-v3 | 6579.20 ± 1470.57 | python3 mujoco_sac.py --task Humanoid-v3 --alpha 0.05 | 2~3h | 
| Ant-v3 | 6281.65 ± 686.28 | python3 mujoco_sac.py --task Ant-v3 | 2~3h | 
Which parts are important?
- DO NOT share the same network with two critic networks.
- The sigma (of the Gaussian policy) MUST be conditioned on input.
- The network size should not be less than 256.
- The deterministic evaluation helps a lot :)
