lucidrains
|
2ccb290e26
|
pass the attend kwargs for the block causal masking in tokenizer
0.0.23
|
2025-10-16 08:33:26 -07:00 |
|
lucidrains
|
517ef6b94b
|
oops
0.0.22
|
2025-10-16 07:03:51 -07:00 |
|
lucidrains
|
ec18bc0fa4
|
cleanup
|
2025-10-16 06:44:28 -07:00 |
|
lucidrains
|
2a902eaaf7
|
allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it
0.0.21
|
2025-10-16 06:41:02 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
0.0.20
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
ff81dd761b
|
separate action and agent embeds
0.0.19
|
2025-10-13 11:36:21 -07:00 |
|
lucidrains
|
6dbdc3d7d8
|
correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values
0.0.18
|
2025-10-12 16:16:18 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
0.0.17
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
c5e64ff4ce
|
separate out the key from the value projections in attention for muon
0.0.16
|
2025-10-12 09:42:22 -07:00 |
|
lucidrains
|
ab5de6795f
|
bring in muon
|
2025-10-12 09:35:06 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
0.0.15
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
01bf70e18a
|
0.0.14
0.0.14
|
2025-10-11 09:24:58 -07:00 |
|
lucidrains
|
b2725d9b6e
|
complete behavior cloning for one agent
|
2025-10-11 09:24:49 -07:00 |
|
lucidrains
|
02558d1f08
|
will organize the unembedding parameters under the actor optimizer
|
2025-10-11 06:55:57 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
0.0.12
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
0.0.11
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
0.0.10
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
c68942b026
|
cleanup
|
2025-10-10 10:42:54 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
0.0.9
|
2025-10-10 10:41:48 -07:00 |
|
lucidrains
|
9101a49cdd
|
handle continuous value normalization if stats passed in
|
2025-10-09 08:59:54 -07:00 |
|
lucidrains
|
31f4363be7
|
must be able to do phase1 and phase2 training
|
2025-10-09 08:04:36 -07:00 |
|
lucidrains
|
e2d86a4543
|
add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)
0.0.8
|
2025-10-09 07:53:42 -07:00 |
|
lucidrains
|
b62c08be65
|
fix task embed in presence of multiple agent tokens
|
2025-10-08 08:42:25 -07:00 |
|
lucidrains
|
4c2ed100a3
|
fix masking for multiple agent tokens
0.0.7
|
2025-10-08 08:26:44 -07:00 |
|
lucidrains
|
ed0918c974
|
prepare for evolution within dreams
|
2025-10-08 08:13:16 -07:00 |
|
lucidrains
|
892654d442
|
multiple agent tokens sharing the same state
|
2025-10-08 08:06:13 -07:00 |
|
lucidrains
|
c4e0f46528
|
for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers
|
2025-10-08 07:37:34 -07:00 |
|
lucidrains
|
a50e360502
|
makes more sense for the noise to be fixed
|
2025-10-08 07:17:05 -07:00 |
|
Phil Wang
|
9c56ba0c9d
|
Merge pull request #3 from lucidrains/pytest-shard
add pytest shard
|
2025-10-08 07:03:11 -07:00 |
|
lucidrains
|
b5744237bf
|
fix
|
2025-10-08 06:58:46 -07:00 |
|
lucidrains
|
63b63dfedd
|
add shard
|
2025-10-08 06:56:03 -07:00 |
|
lucidrains
|
612f5f5dd1
|
a bit of dropout to rewards as state
|
2025-10-08 06:45:25 -07:00 |
|
lucidrains
|
c8f75caa40
|
although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state
|
2025-10-08 06:40:43 -07:00 |
|
lucidrains
|
187edc1414
|
all set for generating the perceived rewards once the RL components fall into place
|
2025-10-08 06:33:28 -07:00 |
|
lucidrains
|
f7bdaddbbb
|
one more incision before knocking out reward decoding
|
2025-10-08 06:11:02 -07:00 |
|
lucidrains
|
c056835aea
|
address https://github.com/lucidrains/dreamer4/issues/2
0.0.5
|
2025-10-08 05:55:22 -07:00 |
|
lucidrains
|
4de357b6c2
|
tiny change needed to have the world model produce both the video and predicted rewards (after phase 2 finetuning)
|
2025-10-08 05:52:13 -07:00 |
|
lucidrains
|
0fdb67bafa
|
add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work
0.0.4
|
2025-10-07 09:37:37 -07:00 |
|
lucidrains
|
36ccb08500
|
allow for step_sizes to be passed in, log2 is not that intuitive
0.0.3
|
2025-10-07 08:36:46 -07:00 |
|
lucidrains
|
a8e14f4b7c
|
oops
|
2025-10-07 08:09:33 -07:00 |
|
lucidrains
|
1176269927
|
correct signal levels when doing teacher forcing generation
0.0.2
|
2025-10-07 07:41:02 -07:00 |
|
lucidrains
|
c6bef85984
|
generating video with raw teacher forcing
0.0.1
|
2025-10-07 07:22:57 -07:00 |
|
lucidrains
|
83ba9a285a
|
reorganize tokenizer to generate video from the dynamics model
|
2025-10-06 11:37:45 -07:00 |
|
lucidrains
|
7180a8cf43
|
start carving into the reinforcement learning portion, starting with reward prediction head (single for now)
|
2025-10-06 11:17:25 -07:00 |
|
lucidrains
|
77724049e2
|
fix latent / modality attention pattern in video tokenizer, thanks to another researcher
|
2025-10-06 09:44:12 -07:00 |
|
lucidrains
|
25b8de91cc
|
handle spatial tokens less than latent tokens in dynamics model
|
2025-10-06 09:19:27 -07:00 |
|
lucidrains
|
bfbecb4968
|
an anonymous researcher pointed out that the video tokenizer may be using multiple latents per frame
|
2025-10-06 08:16:55 -07:00 |
|
lucidrains
|
338def693d
|
oops
|
2025-10-05 11:52:54 -07:00 |
|
lucidrains
|
f507afa0d3
|
last commit for the day - take care of the task embed
|
2025-10-05 11:40:48 -07:00 |
|
lucidrains
|
fe99efecba
|
make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space
|
2025-10-05 11:17:36 -07:00 |
|