82 Commits

Author SHA1 Message Date
lucidrains
563b269f8a bring in hyper connections 0.0.12 2025-10-11 06:52:57 -07:00
lucidrains
5df3e69583 last commit for the day 0.0.11 2025-10-10 11:59:18 -07:00
lucidrains
9230267d34 handle subset of discrete action unembedding 0.0.10 2025-10-10 11:27:05 -07:00
lucidrains
c68942b026 cleanup 2025-10-10 10:42:54 -07:00
lucidrains
32aa355e37 prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 0.0.9 2025-10-10 10:41:48 -07:00
lucidrains
9101a49cdd handle continuous value normalization if stats passed in 2025-10-09 08:59:54 -07:00
lucidrains
31f4363be7 must be able to do phase1 and phase2 training 2025-10-09 08:04:36 -07:00
lucidrains
e2d86a4543 add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 0.0.8 2025-10-09 07:53:42 -07:00
lucidrains
b62c08be65 fix task embed in presence of multiple agent tokens 2025-10-08 08:42:25 -07:00
lucidrains
4c2ed100a3 fix masking for multiple agent tokens 0.0.7 2025-10-08 08:26:44 -07:00
lucidrains
ed0918c974 prepare for evolution within dreams 2025-10-08 08:13:16 -07:00
lucidrains
892654d442 multiple agent tokens sharing the same state 2025-10-08 08:06:13 -07:00
lucidrains
c4e0f46528 for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers 2025-10-08 07:37:34 -07:00
lucidrains
a50e360502 makes more sense for the noise to be fixed 2025-10-08 07:17:05 -07:00
Phil Wang
9c56ba0c9d
Merge pull request #3 from lucidrains/pytest-shard
add pytest shard
2025-10-08 07:03:11 -07:00
lucidrains
b5744237bf fix 2025-10-08 06:58:46 -07:00
lucidrains
63b63dfedd add shard 2025-10-08 06:56:03 -07:00
lucidrains
612f5f5dd1 a bit of dropout to rewards as state 2025-10-08 06:45:25 -07:00
lucidrains
c8f75caa40 although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state 2025-10-08 06:40:43 -07:00
lucidrains
187edc1414 all set for generating the perceived rewards once the RL components fall into place 2025-10-08 06:33:28 -07:00
lucidrains
f7bdaddbbb one more incision before knocking out reward decoding 2025-10-08 06:11:02 -07:00
lucidrains
c056835aea address https://github.com/lucidrains/dreamer4/issues/2 0.0.5 2025-10-08 05:55:22 -07:00
lucidrains
4de357b6c2 tiny change needed to have the world model produce both the video and predicted rewards (after phase 2 finetuning) 2025-10-08 05:52:13 -07:00
lucidrains
0fdb67bafa add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work 0.0.4 2025-10-07 09:37:37 -07:00
lucidrains
36ccb08500 allow for step_sizes to be passed in, log2 is not that intuitive 0.0.3 2025-10-07 08:36:46 -07:00
lucidrains
a8e14f4b7c oops 2025-10-07 08:09:33 -07:00
lucidrains
1176269927 correct signal levels when doing teacher forcing generation 0.0.2 2025-10-07 07:41:02 -07:00
lucidrains
c6bef85984 generating video with raw teacher forcing 0.0.1 2025-10-07 07:22:57 -07:00
lucidrains
83ba9a285a reorganize tokenizer to generate video from the dynamics model 2025-10-06 11:37:45 -07:00
lucidrains
7180a8cf43 start carving into the reinforcement learning portion, starting with reward prediction head (single for now) 2025-10-06 11:17:25 -07:00
lucidrains
77724049e2 fix latent / modality attention pattern in video tokenizer, thanks to another researcher 2025-10-06 09:44:12 -07:00
lucidrains
25b8de91cc handle spatial tokens less than latent tokens in dynamics model 2025-10-06 09:19:27 -07:00
lucidrains
bfbecb4968 an anonymous researcher pointed out that the video tokenizer may be using multiple latents per frame 2025-10-06 08:16:55 -07:00
lucidrains
338def693d oops 2025-10-05 11:52:54 -07:00
lucidrains
f507afa0d3 last commit for the day - take care of the task embed 2025-10-05 11:40:48 -07:00
lucidrains
fe99efecba make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space 2025-10-05 11:17:36 -07:00
lucidrains
971637673b complete all the types of attention masking patterns as proposed in the paper 2025-10-04 12:45:54 -07:00
lucidrains
5c6be4d979 take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though 2025-10-04 12:03:50 -07:00
lucidrains
6c994db341 first nail down the attention masking for the dynamics transformer model using a factory function 2025-10-04 11:20:57 -07:00
lucidrains
ca700ba8e1 prepare for the learning in dreams 2025-10-04 09:44:46 -07:00
lucidrains
e04f9ffec6 for the temporal attention in dynamics model, do rotary the traditional way 2025-10-04 09:41:36 -07:00
lucidrains
1b7f6e787d rotate in the 3d rotary embeddings for the video tokenizer for both encoder / decoder 2025-10-04 09:22:06 -07:00
lucidrains
93f6738c9c given the special attention patterns, attend function needs to be constructed before traversing the transformer layers 2025-10-04 08:31:51 -07:00
lucidrains
7cac3d28c5 cleanup 2025-10-04 08:04:42 -07:00
lucidrains
0f4783f23c use a newly built module from x-mlps for multi token prediction 2025-10-04 07:56:56 -07:00
lucidrains
0a26e0f92f complete the lpips loss used for the video tokenizer 2025-10-04 07:47:27 -07:00
Phil Wang
92e55a90b4
temporary discord 2025-10-04 07:28:36 -07:00
lucidrains
85eea216fd cleanup 2025-10-04 06:59:09 -07:00
lucidrains
895a867a66 able to accept raw video for dynamics model, if tokenizer passed in 2025-10-04 06:57:54 -07:00
lucidrains
8373cb13ec grouped query attention is necessary 2025-10-04 06:31:32 -07:00