49 Commits

Author SHA1 Message Date
lucidrains
338def693d oops 2025-10-05 11:52:54 -07:00
lucidrains
f507afa0d3 last commit for the day - take care of the task embed 2025-10-05 11:40:48 -07:00
lucidrains
fe99efecba make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space 2025-10-05 11:17:36 -07:00
lucidrains
971637673b complete all the types of attention masking patterns as proposed in the paper 2025-10-04 12:45:54 -07:00
lucidrains
5c6be4d979 take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though 2025-10-04 12:03:50 -07:00
lucidrains
6c994db341 first nail down the attention masking for the dynamics transformer model using a factory function 2025-10-04 11:20:57 -07:00
lucidrains
ca700ba8e1 prepare for the learning in dreams 2025-10-04 09:44:46 -07:00
lucidrains
e04f9ffec6 for the temporal attention in dynamics model, do rotary the traditional way 2025-10-04 09:41:36 -07:00
lucidrains
1b7f6e787d rotate in the 3d rotary embeddings for the video tokenizer for both encoder / decoder 2025-10-04 09:22:06 -07:00
lucidrains
93f6738c9c given the special attention patterns, attend function needs to be constructed before traversing the transformer layers 2025-10-04 08:31:51 -07:00
lucidrains
7cac3d28c5 cleanup 2025-10-04 08:04:42 -07:00
lucidrains
0f4783f23c use a newly built module from x-mlps for multi token prediction 2025-10-04 07:56:56 -07:00
lucidrains
0a26e0f92f complete the lpips loss used for the video tokenizer 2025-10-04 07:47:27 -07:00
Phil Wang
92e55a90b4
temporary discord 2025-10-04 07:28:36 -07:00
lucidrains
85eea216fd cleanup 2025-10-04 06:59:09 -07:00
lucidrains
895a867a66 able to accept raw video for dynamics model, if tokenizer passed in 2025-10-04 06:57:54 -07:00
lucidrains
8373cb13ec grouped query attention is necessary 2025-10-04 06:31:32 -07:00
lucidrains
58a6964dd9 the dynamics model has a spatial attention with a non-causal attention pattern but nothing else attending to agent tokens 2025-10-03 11:59:22 -07:00
lucidrains
77ad96ded2 make attention masking correct for dynamics model 2025-10-03 11:18:44 -07:00
lucidrains
986bf4c529 allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP 2025-10-03 10:08:05 -07:00
lucidrains
90bf19f076 take care of the loss weight proposed in eq 8 2025-10-03 08:19:38 -07:00
lucidrains
046f8927d1 complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss 2025-10-03 08:08:44 -07:00
lucidrains
2a896ab01d last commit for the day 2025-10-02 12:39:20 -07:00
lucidrains
8d1cd311bb Revert "address https://github.com/lucidrains/dreamer4/issues/1"
This reverts commit e23a5294ec2f49d58d3ccb936c498eb86939fa96.
2025-10-02 12:25:05 -07:00
lucidrains
e23a5294ec address https://github.com/lucidrains/dreamer4/issues/1 2025-10-02 11:49:22 -07:00
lucidrains
51e0852604 cleanup 2025-10-02 09:43:30 -07:00
lucidrains
0b503d880d ellipsis 2025-10-02 09:14:39 -07:00
lucidrains
e6c808960f take care of the MAE portion from Kaiming He 2025-10-02 08:57:44 -07:00
lucidrains
49082d8629 x-space and v-space prediction in dynamics model 2025-10-02 08:36:00 -07:00
lucidrains
8b66b703e0 add the discretized signal level + step size embeddings necessary for diffusion forcing + shortcut 2025-10-02 07:39:34 -07:00
lucidrains
bb7a5d1680 sketch out the axial space time transformer in dynamics model 2025-10-02 07:17:58 -07:00
lucidrains
0285bba821 flesh out tokenizer even more 2025-10-02 06:11:04 -07:00
lucidrains
31c4aa28c7 start setting up tokenizer 2025-10-02 05:37:43 -07:00
lucidrains
67519a451d softclamping in flex 2025-10-01 12:19:41 -07:00
lucidrains
8e7a35b89c cover the attention masking for tokenizer encoder, decoder, as well as dynamics model (latent and agent tokens are "special" and placed on the right) 2025-10-01 12:11:06 -07:00
lucidrains
c18c624be6 their latent bottleneck is tanh it seems, constraining it to -1 to 1 for flow matching in dynamics model. please open an issue if mistakened 2025-10-01 10:39:16 -07:00
lucidrains
e3cbcd94c6 sketch out top down 2025-10-01 10:25:56 -07:00
lucidrains
882e63511b will apply the golden gate rotary for this work as an option 2025-10-01 10:07:54 -07:00
lucidrains
ceb1af263e oops 2025-10-01 09:49:04 -07:00
lucidrains
c979883f21 ready the block causal mask 2025-10-01 09:45:54 -07:00
lucidrains
2e92c0121a they employ two stability measures, qk rmsnorm and softclamping of attention logits 2025-10-01 09:40:24 -07:00
lucidrains
e8678364ba swish glu feedforward from shazeer et al 2025-10-01 09:28:25 -07:00
lucidrains
8ebb8a9661 finished a first pass at digesting the paper, start with transformer 2025-10-01 09:21:55 -07:00
lucidrains
e0dd4cfeaa they replace the recurrent state-space model with a transformer, with the implication that the former does not scale 2025-10-01 07:59:02 -07:00
lucidrains
bdc7dd30a6 scaffold 2025-10-01 07:18:23 -07:00
Phil Wang
62e9c4eecf
project page 2025-10-01 06:56:03 -07:00
lucidrains
febbc73284 dreamer fig2 2025-10-01 06:30:29 -07:00
Phil Wang
deecd30f52
wip 2025-09-30 05:59:20 -07:00
Phil Wang
4eeb4ee7fc
Initial commit 2025-09-30 05:58:16 -07:00