11 Commits

Author SHA1 Message Date
lucidrains
ceb1af263e oops 2025-10-01 09:49:04 -07:00
lucidrains
c979883f21 ready the block causal mask 2025-10-01 09:45:54 -07:00
lucidrains
2e92c0121a they employ two stability measures, qk rmsnorm and softclamping of attention logits 2025-10-01 09:40:24 -07:00
lucidrains
e8678364ba swish glu feedforward from shazeer et al 2025-10-01 09:28:25 -07:00
lucidrains
8ebb8a9661 finished a first pass at digesting the paper, start with transformer 2025-10-01 09:21:55 -07:00
lucidrains
e0dd4cfeaa they replace the recurrent state-space model with a transformer, with the implication that the former does not scale 2025-10-01 07:59:02 -07:00
lucidrains
bdc7dd30a6 scaffold 2025-10-01 07:18:23 -07:00
Phil Wang
62e9c4eecf
project page 2025-10-01 06:56:03 -07:00
lucidrains
febbc73284 dreamer fig2 2025-10-01 06:30:29 -07:00
Phil Wang
deecd30f52
wip 2025-09-30 05:59:20 -07:00
Phil Wang
4eeb4ee7fc
Initial commit 2025-09-30 05:58:16 -07:00