lucidrains
|
2d20d0a6c1
|
able to roll out actions from one agent within the dreams of a world model
|
2025-10-16 10:15:43 -07:00 |
|
lucidrains
|
d74f09f0b3
|
a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation
|
2025-10-16 09:40:14 -07:00 |
|
lucidrains
|
2ccb290e26
|
pass the attend kwargs for the block causal masking in tokenizer
|
2025-10-16 08:33:26 -07:00 |
|
lucidrains
|
517ef6b94b
|
oops
|
2025-10-16 07:03:51 -07:00 |
|
lucidrains
|
2a902eaaf7
|
allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it
|
2025-10-16 06:41:02 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
ff81dd761b
|
separate action and agent embeds
|
2025-10-13 11:36:21 -07:00 |
|
lucidrains
|
6dbdc3d7d8
|
correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values
|
2025-10-12 16:16:18 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
c5e64ff4ce
|
separate out the key from the value projections in attention for muon
|
2025-10-12 09:42:22 -07:00 |
|
lucidrains
|
ab5de6795f
|
bring in muon
|
2025-10-12 09:35:06 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
01bf70e18a
|
0.0.14
|
2025-10-11 09:24:58 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
|
2025-10-10 10:41:48 -07:00 |
|
lucidrains
|
e2d86a4543
|
add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)
|
2025-10-09 07:53:42 -07:00 |
|
lucidrains
|
4c2ed100a3
|
fix masking for multiple agent tokens
|
2025-10-08 08:26:44 -07:00 |
|
lucidrains
|
63b63dfedd
|
add shard
|
2025-10-08 06:56:03 -07:00 |
|
lucidrains
|
187edc1414
|
all set for generating the perceived rewards once the RL components fall into place
|
2025-10-08 06:33:28 -07:00 |
|
lucidrains
|
c056835aea
|
address https://github.com/lucidrains/dreamer4/issues/2
|
2025-10-08 05:55:22 -07:00 |
|
lucidrains
|
0fdb67bafa
|
add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work
|
2025-10-07 09:37:37 -07:00 |
|
lucidrains
|
36ccb08500
|
allow for step_sizes to be passed in, log2 is not that intuitive
|
2025-10-07 08:36:46 -07:00 |
|
lucidrains
|
1176269927
|
correct signal levels when doing teacher forcing generation
|
2025-10-07 07:41:02 -07:00 |
|
lucidrains
|
0f4783f23c
|
use a newly built module from x-mlps for multi token prediction
|
2025-10-04 07:56:56 -07:00 |
|
lucidrains
|
0a26e0f92f
|
complete the lpips loss used for the video tokenizer
|
2025-10-04 07:47:27 -07:00 |
|
lucidrains
|
986bf4c529
|
allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP
|
2025-10-03 10:08:05 -07:00 |
|
lucidrains
|
046f8927d1
|
complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss
|
2025-10-03 08:08:44 -07:00 |
|
lucidrains
|
8b66b703e0
|
add the discretized signal level + step size embeddings necessary for diffusion forcing + shortcut
|
2025-10-02 07:39:34 -07:00 |
|
lucidrains
|
e3cbcd94c6
|
sketch out top down
|
2025-10-01 10:25:56 -07:00 |
|
lucidrains
|
2e92c0121a
|
they employ two stability measures, qk rmsnorm and softclamping of attention logits
|
2025-10-01 09:40:24 -07:00 |
|
lucidrains
|
bdc7dd30a6
|
scaffold
|
2025-10-01 07:18:23 -07:00 |
|