ecbe13efe8allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction)
0.0.43
lucidrains
2025-10-19 08:37:56 -07:00
f651d779e3able to control the update of the loss ema from dynamics model forward
0.0.42
lucidrains
2025-10-19 08:25:50 -07:00
374667d8a9take care of the loss normalization mentioned at the end of the first paragraph of section 3
0.0.41
lucidrains
2025-10-19 08:24:41 -07:00
0c1b067f97if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon
lucidrains
2025-10-17 08:55:20 -07:00
cb416c0d44handle the entropies during policy optimization
0.0.30
lucidrains
2025-10-17 08:47:26 -07:00
61773c8219eventually we will need to learn from the outside stream of experience
0.0.29
lucidrains
2025-10-17 08:06:24 -07:00
c382307963eventually we will need to learn from the outside stream of experience
0.0.28
lucidrains
2025-10-17 08:05:43 -07:00
0dba734280start the learning in dreams portion
0.0.27
lucidrains
2025-10-17 08:00:47 -07:00
a0161760a0extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training
0.0.26
lucidrains
2025-10-16 10:40:59 -07:00
2d20d0a6c1able to roll out actions from one agent within the dreams of a world model
0.0.25
lucidrains
2025-10-16 10:15:43 -07:00
d74f09f0b3a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation
0.0.24
lucidrains
2025-10-16 09:40:14 -07:00
2ccb290e26pass the attend kwargs for the block causal masking in tokenizer
0.0.23
lucidrains
2025-10-16 08:33:26 -07:00
2a902eaaf7allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it
0.0.21
lucidrains
2025-10-16 06:41:02 -07:00
d28251e9f9another consideration before knocking out the RL logic
0.0.20
lucidrains
2025-10-14 11:10:26 -07:00
ff81dd761bseparate action and agent embeds
0.0.19
lucidrains
2025-10-13 11:36:21 -07:00
6dbdc3d7d8correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values
0.0.18
lucidrains
2025-10-12 16:16:18 -07:00
32aa355e37prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
0.0.9
lucidrains
2025-10-10 10:41:48 -07:00
9101a49cddhandle continuous value normalization if stats passed in
lucidrains
2025-10-09 08:59:54 -07:00
31f4363be7must be able to do phase1 and phase2 training
lucidrains
2025-10-09 08:04:36 -07:00
e2d86a4543add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)
0.0.8
lucidrains
2025-10-09 07:53:42 -07:00
b62c08be65fix task embed in presence of multiple agent tokens
lucidrains
2025-10-08 08:42:25 -07:00
ed0918c974prepare for evolution within dreams
lucidrains
2025-10-08 08:13:16 -07:00
892654d442multiple agent tokens sharing the same state
lucidrains
2025-10-08 08:06:13 -07:00
c4e0f46528for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers
lucidrains
2025-10-08 07:37:34 -07:00
a50e360502makes more sense for the noise to be fixed
lucidrains
2025-10-08 07:17:05 -07:00
612f5f5dd1a bit of dropout to rewards as state
lucidrains
2025-10-08 06:45:25 -07:00
c8f75caa40although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state
lucidrains
2025-10-08 06:40:15 -07:00
187edc1414all set for generating the perceived rewards once the RL components fall into place
lucidrains
2025-10-08 06:33:28 -07:00
f7bdaddbbbone more incision before knocking out reward decoding
lucidrains
2025-10-08 06:11:02 -07:00
4de357b6c2tiny change needed to have the world model produce both the video and predicted rewards (after phase 2 finetuning)
lucidrains
2025-10-08 05:52:13 -07:00
0fdb67bafaadd the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work
0.0.4
lucidrains
2025-10-07 09:37:37 -07:00
36ccb08500allow for step_sizes to be passed in, log2 is not that intuitive
0.0.3
lucidrains
2025-10-07 08:36:46 -07:00
1176269927correct signal levels when doing teacher forcing generation
0.0.2
lucidrains
2025-10-07 07:41:02 -07:00
c6bef85984generating video with raw teacher forcing
0.0.1
lucidrains
2025-10-07 07:22:57 -07:00
83ba9a285areorganize tokenizer to generate video from the dynamics model
lucidrains
2025-10-06 11:37:45 -07:00
7180a8cf43start carving into the reinforcement learning portion, starting with reward prediction head (single for now)
lucidrains
2025-10-06 11:17:25 -07:00
77724049e2fix latent / modality attention pattern in video tokenizer, thanks to another researcher
lucidrains
2025-10-06 09:43:16 -07:00
25b8de91cchandle spatial tokens less than latent tokens in dynamics model
lucidrains
2025-10-06 09:19:27 -07:00
bfbecb4968an anonymous researcher pointed out that the video tokenizer may be using multiple latents per frame
lucidrains
2025-10-06 08:16:55 -07:00
f507afa0d3last commit for the day - take care of the task embed
lucidrains
2025-10-05 11:40:48 -07:00
fe99efecbamake a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space
lucidrains
2025-10-05 11:17:36 -07:00
971637673bcomplete all the types of attention masking patterns as proposed in the paper
lucidrains
2025-10-04 12:45:54 -07:00
5c6be4d979take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though
lucidrains
2025-10-04 12:03:50 -07:00
6c994db341first nail down the attention masking for the dynamics transformer model using a factory function
lucidrains
2025-10-04 11:20:57 -07:00
ca700ba8e1prepare for the learning in dreams
lucidrains
2025-10-04 09:44:46 -07:00
e04f9ffec6for the temporal attention in dynamics model, do rotary the traditional way
lucidrains
2025-10-04 09:41:36 -07:00
1b7f6e787drotate in the 3d rotary embeddings for the video tokenizer for both encoder / decoder
lucidrains
2025-10-04 09:22:06 -07:00
93f6738c9cgiven the special attention patterns, attend function needs to be constructed before traversing the transformer layers
lucidrains
2025-10-04 08:31:51 -07:00
895a867a66able to accept raw video for dynamics model, if tokenizer passed in
lucidrains
2025-10-04 06:57:54 -07:00
8373cb13ecgrouped query attention is necessary
lucidrains
2025-10-04 06:31:32 -07:00
58a6964dd9the dynamics model has a spatial attention with a non-causal attention pattern but nothing else attending to agent tokens
lucidrains
2025-10-03 11:59:07 -07:00
77ad96ded2make attention masking correct for dynamics model
lucidrains
2025-10-03 11:18:44 -07:00
986bf4c529allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP
lucidrains
2025-10-03 10:08:05 -07:00
90bf19f076take care of the loss weight proposed in eq 8
lucidrains
2025-10-03 08:19:38 -07:00
046f8927d1complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss
lucidrains
2025-10-03 08:07:57 -07:00
2a896ab01dlast commit for the day
lucidrains
2025-10-02 12:39:20 -07:00