lucidrains
|
35c1db4c7d
|
sketch of training from sim env
|
2025-10-24 09:13:09 -07:00 |
|
lucidrains
|
27ac05efb0
|
function for combining experiences
|
2025-10-24 08:00:10 -07:00 |
|
lucidrains
|
d0ffc6bfed
|
with or without signed advantage
|
2025-10-23 16:24:29 -07:00 |
|
lucidrains
|
fb3e026fe0
|
handle vectorized env
|
2025-10-22 11:19:44 -07:00 |
|
lucidrains
|
d82debb7a6
|
first pass through gathering experience with a mock env for online rl
|
2025-10-22 08:32:46 -07:00 |
|
lucidrains
|
03b16a48f2
|
sketch out the dream trainer, seems like they only fine tune the heads
|
2025-10-22 06:41:10 -07:00 |
|
lucidrains
|
6f1a7a24ed
|
try to fix ci
|
2025-10-21 11:47:39 -07:00 |
|
lucidrains
|
2fc3b17149
|
take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards
|
2025-10-21 10:20:08 -07:00 |
|
lucidrains
|
283d59d75a
|
oops
|
2025-10-21 09:50:07 -07:00 |
|
lucidrains
|
4a5465eeb6
|
fix ci
|
2025-10-21 09:17:53 -07:00 |
|
lucidrains
|
b34128d3d0
|
make sure time kv cache can be passed back in during generation
|
2025-10-21 09:15:32 -07:00 |
|
lucidrains
|
ea13d4fcab
|
take a gradient step with video tokenizer trainer
|
2025-10-21 08:52:22 -07:00 |
|
lucidrains
|
ca244a290c
|
first pass through the kv cache for the time block in the dynamics model
|
2025-10-20 12:25:50 -07:00 |
|
lucidrains
|
374667d8a9
|
take care of the loss normalization mentioned at the end of the first paragraph of section 3
|
2025-10-19 08:24:41 -07:00 |
|
lucidrains
|
b6aa19f31e
|
complete multi-token prediction for actions, tackle loss balancing another day
|
2025-10-18 10:23:14 -07:00 |
|
lucidrains
|
5fc0022bbf
|
the function for generating the MTP targets, as well as the mask for the losses
|
2025-10-18 08:04:51 -07:00 |
|
lucidrains
|
22e13c45fc
|
rename
|
2025-10-17 14:44:25 -07:00 |
|
lucidrains
|
cb416c0d44
|
handle the entropies during policy optimization
|
2025-10-17 08:47:26 -07:00 |
|
lucidrains
|
0dba734280
|
start the learning in dreams portion
|
2025-10-17 08:00:47 -07:00 |
|
lucidrains
|
a0161760a0
|
extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training
|
2025-10-16 10:40:59 -07:00 |
|
lucidrains
|
2d20d0a6c1
|
able to roll out actions from one agent within the dreams of a world model
|
2025-10-16 10:15:43 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
b2725d9b6e
|
complete behavior cloning for one agent
|
2025-10-11 09:24:49 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
|
2025-10-10 10:41:48 -07:00 |
|
lucidrains
|
9101a49cdd
|
handle continuous value normalization if stats passed in
|
2025-10-09 08:59:54 -07:00 |
|
lucidrains
|
31f4363be7
|
must be able to do phase1 and phase2 training
|
2025-10-09 08:04:36 -07:00 |
|
lucidrains
|
e2d86a4543
|
add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)
|
2025-10-09 07:53:42 -07:00 |
|
lucidrains
|
c4e0f46528
|
for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers
|
2025-10-08 07:37:34 -07:00 |
|
lucidrains
|
187edc1414
|
all set for generating the perceived rewards once the RL components fall into place
|
2025-10-08 06:33:28 -07:00 |
|
lucidrains
|
36ccb08500
|
allow for step_sizes to be passed in, log2 is not that intuitive
|
2025-10-07 08:36:46 -07:00 |
|
lucidrains
|
a8e14f4b7c
|
oops
|
2025-10-07 08:09:33 -07:00 |
|
lucidrains
|
c6bef85984
|
generating video with raw teacher forcing
|
2025-10-07 07:22:57 -07:00 |
|
lucidrains
|
83ba9a285a
|
reorganize tokenizer to generate video from the dynamics model
|
2025-10-06 11:37:45 -07:00 |
|
lucidrains
|
7180a8cf43
|
start carving into the reinforcement learning portion, starting with reward prediction head (single for now)
|
2025-10-06 11:17:25 -07:00 |
|
lucidrains
|
25b8de91cc
|
handle spatial tokens less than latent tokens in dynamics model
|
2025-10-06 09:19:27 -07:00 |
|
lucidrains
|
f507afa0d3
|
last commit for the day - take care of the task embed
|
2025-10-05 11:40:48 -07:00 |
|
lucidrains
|
fe99efecba
|
make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space
|
2025-10-05 11:17:36 -07:00 |
|
lucidrains
|
971637673b
|
complete all the types of attention masking patterns as proposed in the paper
|
2025-10-04 12:45:54 -07:00 |
|
lucidrains
|
5c6be4d979
|
take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though
|
2025-10-04 12:03:50 -07:00 |
|
lucidrains
|
6c994db341
|
first nail down the attention masking for the dynamics transformer model using a factory function
|
2025-10-04 11:20:57 -07:00 |
|
lucidrains
|
895a867a66
|
able to accept raw video for dynamics model, if tokenizer passed in
|
2025-10-04 06:57:54 -07:00 |
|
lucidrains
|
8373cb13ec
|
grouped query attention is necessary
|
2025-10-04 06:31:32 -07:00 |
|
lucidrains
|
046f8927d1
|
complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss
|
2025-10-03 08:08:44 -07:00 |
|
lucidrains
|
8d1cd311bb
|
Revert "address https://github.com/lucidrains/dreamer4/issues/1"
This reverts commit e23a5294ec2f49d58d3ccb936c498eb86939fa96.
|
2025-10-02 12:25:05 -07:00 |
|
lucidrains
|
e23a5294ec
|
address https://github.com/lucidrains/dreamer4/issues/1
|
2025-10-02 11:49:22 -07:00 |
|