dreamer4

hongshaorou/dreamer4

Fork 0

Commit Graph

Select branches

Hide Pull Requests

main

pytest-shard

#3

#4

#5

0.0.1

0.0.10

0.0.100

0.0.101

0.0.102

0.0.11

0.0.12

0.0.14

0.0.15

0.0.16

0.0.17

0.0.18

0.0.19

0.0.2

0.0.20

0.0.21

0.0.22

0.0.23

0.0.24

0.0.25

0.0.26

0.0.27

0.0.28

0.0.29

0.0.3

0.0.30

0.0.31

0.0.32

0.0.33

0.0.34

0.0.35

0.0.36

0.0.37

0.0.38

0.0.39

0.0.4

0.0.40

0.0.41

0.0.42

0.0.43

0.0.44

0.0.45

0.0.46

0.0.47

0.0.48

0.0.49

0.0.5

0.0.50

0.0.52

0.0.53

0.0.54

0.0.55

0.0.56

0.0.57

0.0.58

0.0.59

0.0.60

0.0.61

0.0.62

0.0.64

0.0.65

0.0.66

0.0.67

0.0.68

0.0.69

0.0.7

0.0.70

0.0.71

0.0.72

0.0.73

0.0.74

0.0.75

0.0.76

0.0.77

0.0.78

0.0.79

0.0.8

0.0.80

0.0.81

0.0.82

0.0.83

0.0.84

0.0.85

0.0.87

0.0.88

0.0.89

0.0.9

0.0.90

0.0.91

0.0.92

0.0.93

0.0.94

0.0.95

0.0.96

0.0.97

0.0.98

0.0.99

0.1.0

0.1.1

0.1.10

0.1.11

0.1.12

0.1.14

0.1.15

0.1.16

0.1.17

0.1.18

0.1.19

0.1.2

0.1.20

0.1.21

0.1.22

0.1.23

0.1.24

0.1.4

0.1.5

0.1.6

0.1.7

0.1.8

ecbe13efe8 allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43 lucidrains 2025-10-19 08:37:56 -07:00
f651d779e3 able to control the update of the loss ema from dynamics model forward 0.0.42 lucidrains 2025-10-19 08:25:50 -07:00
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41 lucidrains 2025-10-19 08:24:41 -07:00
79a1b1c46e oops 0.0.40 lucidrains 2025-10-18 10:31:48 -07:00
b07870333c oops 0.0.39 lucidrains 2025-10-18 10:28:56 -07:00
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 0.0.38 lucidrains 2025-10-18 10:23:14 -07:00
bc629d78b1 inverse norm for continuous actions when sampling 0.0.37 lucidrains 2025-10-18 08:55:04 -07:00
0ee475d2df oops 0.0.36 lucidrains 2025-10-18 08:50:53 -07:00
8c88a33d3b complete multi token prediction for the reward head 0.0.35 lucidrains 2025-10-18 08:33:06 -07:00
911a1a8434 oops 0.0.34 lucidrains 2025-10-18 08:07:06 -07:00
5fc0022bbf the function for generating the MTP targets, as well as the mask for the losses lucidrains 2025-10-18 08:04:51 -07:00
83cfd2cd1b task conditioning when dreaming 0.0.33 lucidrains 2025-10-18 07:47:13 -07:00
22e13c45fc rename 0.0.32 lucidrains 2025-10-17 14:44:25 -07:00
c967404471 0.0.31 0.0.31 lucidrains 2025-10-17 08:55:42 -07:00
0c1b067f97 if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon lucidrains 2025-10-17 08:55:20 -07:00
cb416c0d44 handle the entropies during policy optimization 0.0.30 lucidrains 2025-10-17 08:47:26 -07:00
61773c8219 eventually we will need to learn from the outside stream of experience 0.0.29 lucidrains 2025-10-17 08:06:24 -07:00
c382307963 eventually we will need to learn from the outside stream of experience 0.0.28 lucidrains 2025-10-17 08:05:43 -07:00
0dba734280 start the learning in dreams portion 0.0.27 lucidrains 2025-10-17 08:00:47 -07:00
a0161760a0 extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 0.0.26 lucidrains 2025-10-16 10:40:59 -07:00
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 0.0.25 lucidrains 2025-10-16 10:15:43 -07:00
d74f09f0b3 a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 0.0.24 lucidrains 2025-10-16 09:40:14 -07:00
2ccb290e26 pass the attend kwargs for the block causal masking in tokenizer 0.0.23 lucidrains 2025-10-16 08:33:26 -07:00
517ef6b94b oops 0.0.22 lucidrains 2025-10-16 07:03:51 -07:00
ec18bc0fa4 cleanup lucidrains 2025-10-16 06:44:28 -07:00
2a902eaaf7 allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it 0.0.21 lucidrains 2025-10-16 06:41:02 -07:00
d28251e9f9 another consideration before knocking out the RL logic 0.0.20 lucidrains 2025-10-14 11:10:26 -07:00
ff81dd761b separate action and agent embeds 0.0.19 lucidrains 2025-10-13 11:36:21 -07:00
6dbdc3d7d8 correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values 0.0.18 lucidrains 2025-10-12 16:16:18 -07:00
9c78962736 sampling actions 0.0.17 lucidrains 2025-10-12 11:27:12 -07:00
c5e64ff4ce separate out the key from the value projections in attention for muon 0.0.16 lucidrains 2025-10-12 09:42:22 -07:00
ab5de6795f bring in muon lucidrains 2025-10-12 09:35:06 -07:00
8a73a27fc7 add nested tensor way for getting log prob of multiple discrete actions 0.0.15 lucidrains 2025-10-11 10:53:24 -07:00
01bf70e18a 0.0.14 0.0.14 lucidrains 2025-10-11 09:24:58 -07:00
b2725d9b6e complete behavior cloning for one agent lucidrains 2025-10-11 09:24:49 -07:00
02558d1f08 will organize the unembedding parameters under the actor optimizer lucidrains 2025-10-11 06:55:57 -07:00
563b269f8a bring in hyper connections 0.0.12 lucidrains 2025-10-11 06:52:57 -07:00
5df3e69583 last commit for the day 0.0.11 lucidrains 2025-10-10 11:59:18 -07:00
9230267d34 handle subset of discrete action unembedding 0.0.10 lucidrains 2025-10-10 11:27:05 -07:00
c68942b026 cleanup lucidrains 2025-10-10 10:42:54 -07:00
32aa355e37 prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 0.0.9 lucidrains 2025-10-10 10:41:48 -07:00
9101a49cdd handle continuous value normalization if stats passed in lucidrains 2025-10-09 08:59:54 -07:00
31f4363be7 must be able to do phase1 and phase2 training lucidrains 2025-10-09 08:04:36 -07:00
e2d86a4543 add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 0.0.8 lucidrains 2025-10-09 07:53:42 -07:00
b62c08be65 fix task embed in presence of multiple agent tokens lucidrains 2025-10-08 08:42:25 -07:00
4c2ed100a3 fix masking for multiple agent tokens 0.0.7 lucidrains 2025-10-08 08:26:44 -07:00
ed0918c974 prepare for evolution within dreams lucidrains 2025-10-08 08:13:16 -07:00
892654d442 multiple agent tokens sharing the same state lucidrains 2025-10-08 08:06:13 -07:00
c4e0f46528 for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers lucidrains 2025-10-08 07:37:34 -07:00
a50e360502 makes more sense for the noise to be fixed lucidrains 2025-10-08 07:17:05 -07:00
9c56ba0c9d

Merge pull request #3 from lucidrains/pytest-shard Phil Wang 2025-10-08 07:03:11 -07:00
b5744237bf fix pytest-shard lucidrains 2025-10-08 06:58:46 -07:00
63b63dfedd add shard lucidrains 2025-10-08 06:56:03 -07:00
612f5f5dd1 a bit of dropout to rewards as state lucidrains 2025-10-08 06:45:25 -07:00
c8f75caa40 although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state lucidrains 2025-10-08 06:40:15 -07:00
187edc1414 all set for generating the perceived rewards once the RL components fall into place lucidrains 2025-10-08 06:33:28 -07:00
f7bdaddbbb one more incision before knocking out reward decoding lucidrains 2025-10-08 06:11:02 -07:00
c056835aea address https://github.com/lucidrains/dreamer4/issues/2 0.0.5 lucidrains 2025-10-08 05:55:22 -07:00
4de357b6c2 tiny change needed to have the world model produce both the video and predicted rewards (after phase 2 finetuning) lucidrains 2025-10-08 05:52:13 -07:00
0fdb67bafa add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work 0.0.4 lucidrains 2025-10-07 09:37:37 -07:00
36ccb08500 allow for step_sizes to be passed in, log2 is not that intuitive 0.0.3 lucidrains 2025-10-07 08:36:46 -07:00
a8e14f4b7c oops lucidrains 2025-10-07 08:09:33 -07:00
1176269927 correct signal levels when doing teacher forcing generation 0.0.2 lucidrains 2025-10-07 07:41:02 -07:00
c6bef85984 generating video with raw teacher forcing 0.0.1 lucidrains 2025-10-07 07:22:57 -07:00
83ba9a285a reorganize tokenizer to generate video from the dynamics model lucidrains 2025-10-06 11:37:45 -07:00
7180a8cf43 start carving into the reinforcement learning portion, starting with reward prediction head (single for now) lucidrains 2025-10-06 11:17:25 -07:00
77724049e2 fix latent / modality attention pattern in video tokenizer, thanks to another researcher lucidrains 2025-10-06 09:43:16 -07:00
25b8de91cc handle spatial tokens less than latent tokens in dynamics model lucidrains 2025-10-06 09:19:27 -07:00
bfbecb4968 an anonymous researcher pointed out that the video tokenizer may be using multiple latents per frame lucidrains 2025-10-06 08:16:55 -07:00
338def693d oops lucidrains 2025-10-05 11:52:54 -07:00
f507afa0d3 last commit for the day - take care of the task embed lucidrains 2025-10-05 11:40:48 -07:00
fe99efecba make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space lucidrains 2025-10-05 11:17:36 -07:00
971637673b complete all the types of attention masking patterns as proposed in the paper lucidrains 2025-10-04 12:45:54 -07:00
5c6be4d979 take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though lucidrains 2025-10-04 12:03:50 -07:00
6c994db341 first nail down the attention masking for the dynamics transformer model using a factory function lucidrains 2025-10-04 11:20:57 -07:00
ca700ba8e1 prepare for the learning in dreams lucidrains 2025-10-04 09:44:46 -07:00
e04f9ffec6 for the temporal attention in dynamics model, do rotary the traditional way lucidrains 2025-10-04 09:41:36 -07:00
1b7f6e787d rotate in the 3d rotary embeddings for the video tokenizer for both encoder / decoder lucidrains 2025-10-04 09:22:06 -07:00
93f6738c9c given the special attention patterns, attend function needs to be constructed before traversing the transformer layers lucidrains 2025-10-04 08:31:51 -07:00
7cac3d28c5 cleanup lucidrains 2025-10-04 08:04:42 -07:00
0f4783f23c use a newly built module from x-mlps for multi token prediction lucidrains 2025-10-04 07:56:56 -07:00
0a26e0f92f complete the lpips loss used for the video tokenizer lucidrains 2025-10-04 07:47:23 -07:00
92e55a90b4

temporary discord Phil Wang 2025-10-04 07:28:36 -07:00
85eea216fd cleanup lucidrains 2025-10-04 06:59:09 -07:00
895a867a66 able to accept raw video for dynamics model, if tokenizer passed in lucidrains 2025-10-04 06:57:54 -07:00
8373cb13ec grouped query attention is necessary lucidrains 2025-10-04 06:31:32 -07:00
58a6964dd9 the dynamics model has a spatial attention with a non-causal attention pattern but nothing else attending to agent tokens lucidrains 2025-10-03 11:59:07 -07:00
77ad96ded2 make attention masking correct for dynamics model lucidrains 2025-10-03 11:18:44 -07:00
986bf4c529 allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP lucidrains 2025-10-03 10:08:05 -07:00
90bf19f076 take care of the loss weight proposed in eq 8 lucidrains 2025-10-03 08:19:38 -07:00
046f8927d1 complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss lucidrains 2025-10-03 08:07:57 -07:00
2a896ab01d last commit for the day lucidrains 2025-10-02 12:39:20 -07:00
8d1cd311bb Revert "address https://github.com/lucidrains/dreamer4/issues/1" lucidrains 2025-10-02 12:25:05 -07:00
e23a5294ec address https://github.com/lucidrains/dreamer4/issues/1 lucidrains 2025-10-02 11:49:22 -07:00
51e0852604 cleanup lucidrains 2025-10-02 09:43:30 -07:00
0b503d880d ellipsis lucidrains 2025-10-02 09:14:39 -07:00
e6c808960f take care of the MAE portion from Kaiming He lucidrains 2025-10-02 08:57:44 -07:00
49082d8629 x-space and v-space prediction in dynamics model lucidrains 2025-10-02 08:36:00 -07:00
8b66b703e0 add the discretized signal level + step size embeddings necessary for diffusion forcing + shortcut lucidrains 2025-10-02 07:39:34 -07:00
bb7a5d1680 sketch out the axial space time transformer in dynamics model lucidrains 2025-10-02 07:17:58 -07:00