lucidrains
|
2e92c0121a
|
they employ two stability measures, qk rmsnorm and softclamping of attention logits
|
2025-10-01 09:40:24 -07:00 |
|
lucidrains
|
e8678364ba
|
swish glu feedforward from shazeer et al
|
2025-10-01 09:28:25 -07:00 |
|
lucidrains
|
8ebb8a9661
|
finished a first pass at digesting the paper, start with transformer
|
2025-10-01 09:21:55 -07:00 |
|
lucidrains
|
e0dd4cfeaa
|
they replace the recurrent state-space model with a transformer, with the implication that the former does not scale
|
2025-10-01 07:59:02 -07:00 |
|
lucidrains
|
bdc7dd30a6
|
scaffold
|
2025-10-01 07:18:23 -07:00 |
|
Phil Wang
|
62e9c4eecf
|
project page
|
2025-10-01 06:56:03 -07:00 |
|
lucidrains
|
febbc73284
|
dreamer fig2
|
2025-10-01 06:30:29 -07:00 |
|
Phil Wang
|
deecd30f52
|
wip
|
2025-09-30 05:59:20 -07:00 |
|
Phil Wang
|
4eeb4ee7fc
|
Initial commit
|
2025-09-30 05:58:16 -07:00 |
|