15 Commits

Author SHA1 Message Date
lucidrains
8e7a35b89c cover the attention masking for tokenizer encoder, decoder, as well as dynamics model (latent and agent tokens are "special" and placed on the right) 2025-10-01 12:11:06 -07:00
lucidrains
c18c624be6 their latent bottleneck is tanh it seems, constraining it to -1 to 1 for flow matching in dynamics model. please open an issue if mistakened 2025-10-01 10:39:16 -07:00
lucidrains
e3cbcd94c6 sketch out top down 2025-10-01 10:25:56 -07:00
lucidrains
882e63511b will apply the golden gate rotary for this work as an option 2025-10-01 10:07:54 -07:00
lucidrains
ceb1af263e oops 2025-10-01 09:49:04 -07:00
lucidrains
c979883f21 ready the block causal mask 2025-10-01 09:45:54 -07:00
lucidrains
2e92c0121a they employ two stability measures, qk rmsnorm and softclamping of attention logits 2025-10-01 09:40:24 -07:00
lucidrains
e8678364ba swish glu feedforward from shazeer et al 2025-10-01 09:28:25 -07:00
lucidrains
8ebb8a9661 finished a first pass at digesting the paper, start with transformer 2025-10-01 09:21:55 -07:00
lucidrains
e0dd4cfeaa they replace the recurrent state-space model with a transformer, with the implication that the former does not scale 2025-10-01 07:59:02 -07:00
lucidrains
bdc7dd30a6 scaffold 2025-10-01 07:18:23 -07:00
Phil Wang
62e9c4eecf
project page 2025-10-01 06:56:03 -07:00
lucidrains
febbc73284 dreamer fig2 2025-10-01 06:30:29 -07:00
Phil Wang
deecd30f52
wip 2025-09-30 05:59:20 -07:00
Phil Wang
4eeb4ee7fc
Initial commit 2025-09-30 05:58:16 -07:00