History

Fix mypy issues in tests and examples (#1077 )

Closes #952 

- `SamplingConfig` supports `batch_size=None`. #1077
- tests and examples are covered by `mypy`. #1077
- `NetBase` is more used, stricter typing by making it generic. #1077
- `utils.net.common.Recurrent` now receives and returns a
`RecurrentStateBatch` instead of a dict. #1077

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>

2024-04-03 18:07:51 +02:00

results

Add BranchingDQN for large discrete action spaces (#618 )

2022-05-15 21:40:32 +08:00

acrobot_dualdqn.py

Fix mypy issues in tests and examples (#1077 )

2024-04-03 18:07:51 +02:00

bipedal_bdq.py

Fix mypy issues in tests and examples (#1077 )

2024-04-03 18:07:51 +02:00

bipedal_hardcore_sac.py

Fix mypy issues in tests and examples (#1077 )

2024-04-03 18:07:51 +02:00

lunarlander_dqn.py

Fix mypy issues in tests and examples (#1077 )

2024-04-03 18:07:51 +02:00

mcc_sac.py

Fix mypy issues in tests and examples (#1077 )

2024-04-03 18:07:51 +02:00

README.md

Add BranchingDQN for large discrete action spaces (#618 )

2022-05-15 21:40:32 +08:00

README.md

Bipedal-Hardcore-SAC

Our default choice: remove the done flag penalty, will soon converge to ~280 reward within 100 epochs (10M env steps, 3~4 hours, see the image below)
If the done penalty is not removed, it converges much slower than before, about 200 epochs (20M env steps) to reach the same performance (~200 reward)

BipedalWalker-BDQ

To demonstrate the cpabilities of the BDQ to scale up to big discrete action spaces, we run it on a discretized version of the BipedalWalker-v3 environment, where the number of possible actions in each dimension is 25, for a total of 25^4 = 390 625 possible actions. A usaual DQN architecture would use 25^4 output neurons for the Q-network, thus scaling exponentially with the number of action space dimensions, while the Branching architecture scales linearly and uses only 25*4 output neurons.