55 Commits

Author SHA1 Message Date
lucidrains
ca244a290c first pass through the kv cache for the time block in the dynamics model 2025-10-20 12:25:50 -07:00
lucidrains
a7e0c395c3 allow for only rmsnorm for keys in attention 2025-10-20 11:20:49 -07:00
lucidrains
1345326656 another measure for the attending to nothing issue 2025-10-20 10:32:31 -07:00
lucidrains
55574c054e assert 2025-10-19 09:59:42 -07:00
lucidrains
27ed6d0ba5 fix time kv cache 2025-10-19 09:16:06 -07:00
lucidrains
4930002e99 bit of progress on time kv cache 2025-10-19 09:04:26 -07:00
lucidrains
ecbe13efe8 allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 2025-10-19 08:37:56 -07:00
lucidrains
f651d779e3 able to control the update of the loss ema from dynamics model forward 2025-10-19 08:25:50 -07:00
lucidrains
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 2025-10-19 08:24:41 -07:00
lucidrains
79a1b1c46e oops 2025-10-18 10:31:48 -07:00
lucidrains
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 2025-10-18 10:23:14 -07:00
lucidrains
bc629d78b1 inverse norm for continuous actions when sampling 2025-10-18 08:55:04 -07:00
lucidrains
0ee475d2df oops 2025-10-18 08:50:53 -07:00
lucidrains
8c88a33d3b complete multi token prediction for the reward head 2025-10-18 08:33:06 -07:00
lucidrains
911a1a8434 oops 2025-10-18 08:07:06 -07:00
lucidrains
83cfd2cd1b task conditioning when dreaming 2025-10-18 07:47:13 -07:00
lucidrains
22e13c45fc rename 2025-10-17 14:44:25 -07:00
lucidrains
c967404471 0.0.31 2025-10-17 08:55:42 -07:00
lucidrains
cb416c0d44 handle the entropies during policy optimization 2025-10-17 08:47:26 -07:00
lucidrains
61773c8219 eventually we will need to learn from the outside stream of experience 2025-10-17 08:06:24 -07:00
lucidrains
0dba734280 start the learning in dreams portion 2025-10-17 08:00:47 -07:00
lucidrains
a0161760a0 extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 2025-10-16 10:40:59 -07:00
lucidrains
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 2025-10-16 10:15:43 -07:00
lucidrains
d74f09f0b3 a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 2025-10-16 09:40:14 -07:00
lucidrains
2ccb290e26 pass the attend kwargs for the block causal masking in tokenizer 2025-10-16 08:33:26 -07:00
lucidrains
517ef6b94b oops 2025-10-16 07:03:51 -07:00
lucidrains
2a902eaaf7 allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it 2025-10-16 06:41:02 -07:00
lucidrains
d28251e9f9 another consideration before knocking out the RL logic 2025-10-14 11:10:26 -07:00
lucidrains
ff81dd761b separate action and agent embeds 2025-10-13 11:36:21 -07:00
lucidrains
6dbdc3d7d8 correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values 2025-10-12 16:16:18 -07:00
lucidrains
9c78962736 sampling actions 2025-10-12 11:27:12 -07:00
lucidrains
c5e64ff4ce separate out the key from the value projections in attention for muon 2025-10-12 09:42:22 -07:00
lucidrains
ab5de6795f bring in muon 2025-10-12 09:35:06 -07:00
lucidrains
8a73a27fc7 add nested tensor way for getting log prob of multiple discrete actions 2025-10-11 10:53:24 -07:00
lucidrains
01bf70e18a 0.0.14 2025-10-11 09:24:58 -07:00
lucidrains
563b269f8a bring in hyper connections 2025-10-11 06:52:57 -07:00
lucidrains
5df3e69583 last commit for the day 2025-10-10 11:59:18 -07:00
lucidrains
9230267d34 handle subset of discrete action unembedding 2025-10-10 11:27:05 -07:00
lucidrains
32aa355e37 prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 2025-10-10 10:41:48 -07:00
lucidrains
e2d86a4543 add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 2025-10-09 07:53:42 -07:00
lucidrains
4c2ed100a3 fix masking for multiple agent tokens 2025-10-08 08:26:44 -07:00
lucidrains
63b63dfedd add shard 2025-10-08 06:56:03 -07:00
lucidrains
187edc1414 all set for generating the perceived rewards once the RL components fall into place 2025-10-08 06:33:28 -07:00
lucidrains
c056835aea address https://github.com/lucidrains/dreamer4/issues/2 2025-10-08 05:55:22 -07:00
lucidrains
0fdb67bafa add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work 2025-10-07 09:37:37 -07:00
lucidrains
36ccb08500 allow for step_sizes to be passed in, log2 is not that intuitive 2025-10-07 08:36:46 -07:00
lucidrains
1176269927 correct signal levels when doing teacher forcing generation 2025-10-07 07:41:02 -07:00
lucidrains
0f4783f23c use a newly built module from x-mlps for multi token prediction 2025-10-04 07:56:56 -07:00
lucidrains
0a26e0f92f complete the lpips loss used for the video tokenizer 2025-10-04 07:47:27 -07:00
lucidrains
986bf4c529 allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP 2025-10-03 10:08:05 -07:00