37 Commits

Author SHA1 Message Date
lucidrains
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 2025-10-16 10:15:43 -07:00
lucidrains
d28251e9f9 another consideration before knocking out the RL logic 2025-10-14 11:10:26 -07:00
lucidrains
9c78962736 sampling actions 2025-10-12 11:27:12 -07:00
lucidrains
8a73a27fc7 add nested tensor way for getting log prob of multiple discrete actions 2025-10-11 10:53:24 -07:00
lucidrains
b2725d9b6e complete behavior cloning for one agent 2025-10-11 09:24:49 -07:00
lucidrains
563b269f8a bring in hyper connections 2025-10-11 06:52:57 -07:00
lucidrains
5df3e69583 last commit for the day 2025-10-10 11:59:18 -07:00
lucidrains
9230267d34 handle subset of discrete action unembedding 2025-10-10 11:27:05 -07:00
lucidrains
32aa355e37 prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 2025-10-10 10:41:48 -07:00
lucidrains
9101a49cdd handle continuous value normalization if stats passed in 2025-10-09 08:59:54 -07:00
lucidrains
31f4363be7 must be able to do phase1 and phase2 training 2025-10-09 08:04:36 -07:00
lucidrains
e2d86a4543 add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 2025-10-09 07:53:42 -07:00
lucidrains
c4e0f46528 for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers 2025-10-08 07:37:34 -07:00
lucidrains
187edc1414 all set for generating the perceived rewards once the RL components fall into place 2025-10-08 06:33:28 -07:00
lucidrains
36ccb08500 allow for step_sizes to be passed in, log2 is not that intuitive 2025-10-07 08:36:46 -07:00
lucidrains
a8e14f4b7c oops 2025-10-07 08:09:33 -07:00
lucidrains
c6bef85984 generating video with raw teacher forcing 2025-10-07 07:22:57 -07:00
lucidrains
83ba9a285a reorganize tokenizer to generate video from the dynamics model 2025-10-06 11:37:45 -07:00
lucidrains
7180a8cf43 start carving into the reinforcement learning portion, starting with reward prediction head (single for now) 2025-10-06 11:17:25 -07:00
lucidrains
25b8de91cc handle spatial tokens less than latent tokens in dynamics model 2025-10-06 09:19:27 -07:00
lucidrains
f507afa0d3 last commit for the day - take care of the task embed 2025-10-05 11:40:48 -07:00
lucidrains
fe99efecba make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space 2025-10-05 11:17:36 -07:00
lucidrains
971637673b complete all the types of attention masking patterns as proposed in the paper 2025-10-04 12:45:54 -07:00
lucidrains
5c6be4d979 take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though 2025-10-04 12:03:50 -07:00
lucidrains
6c994db341 first nail down the attention masking for the dynamics transformer model using a factory function 2025-10-04 11:20:57 -07:00
lucidrains
895a867a66 able to accept raw video for dynamics model, if tokenizer passed in 2025-10-04 06:57:54 -07:00
lucidrains
8373cb13ec grouped query attention is necessary 2025-10-04 06:31:32 -07:00
lucidrains
046f8927d1 complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss 2025-10-03 08:08:44 -07:00
lucidrains
8d1cd311bb Revert "address https://github.com/lucidrains/dreamer4/issues/1"
This reverts commit e23a5294ec2f49d58d3ccb936c498eb86939fa96.
2025-10-02 12:25:05 -07:00
lucidrains
e23a5294ec address https://github.com/lucidrains/dreamer4/issues/1 2025-10-02 11:49:22 -07:00
lucidrains
49082d8629 x-space and v-space prediction in dynamics model 2025-10-02 08:36:00 -07:00
lucidrains
8b66b703e0 add the discretized signal level + step size embeddings necessary for diffusion forcing + shortcut 2025-10-02 07:39:34 -07:00
lucidrains
bb7a5d1680 sketch out the axial space time transformer in dynamics model 2025-10-02 07:17:58 -07:00
lucidrains
0285bba821 flesh out tokenizer even more 2025-10-02 06:11:04 -07:00
lucidrains
31c4aa28c7 start setting up tokenizer 2025-10-02 05:37:43 -07:00
lucidrains
e8678364ba swish glu feedforward from shazeer et al 2025-10-01 09:28:25 -07:00
lucidrains
bdc7dd30a6 scaffold 2025-10-01 07:18:23 -07:00