lucidrains
|
5bb027b386
|
allow for image pretraining on video tokenizer
|
2025-12-04 10:34:15 -08:00 |
|
lucidrains
|
fb8c3793b4
|
complete the addition of a state entropy bonus
|
2025-12-03 07:52:30 -08:00 |
|
lucidrains
|
2e7f406d49
|
allow for the combining of experiences from environment and dream
|
2025-11-13 16:37:35 -08:00 |
|
lucidrains
|
690ecf07dc
|
fix the rnn time caching issue
|
2025-11-11 17:04:02 -08:00 |
|
lucidrains
|
c3532fa797
|
add learned value residual
|
2025-11-10 09:33:58 -08:00 |
|
lucidrains
|
0c4224da18
|
add a decorrelation loss for temporal attention in encoder of video tokenizer
|
2025-11-09 09:47:47 -08:00 |
|
lucidrains
|
cfd34f1eba
|
able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it
|
2025-11-09 16:16:13 +00:00 |
|
lucidrains
|
586379f2c8
|
sum the kl div loss across number of actions by default for action embedder .kl_div
|
2025-10-29 10:46:42 -07:00 |
|
lucidrains
|
691d9ca007
|
add kl div on action embedder, working way towards the kl div loss in pmpo
|
2025-10-29 10:02:25 -07:00 |
|
lucidrains
|
91d697f8ca
|
fix pmpo
|
2025-10-28 18:55:22 -07:00 |
|
lucidrains
|
c0450359f3
|
allow for evolutionary policy optimization
|
2025-10-28 10:11:13 -07:00 |
|
lucidrains
|
d476fa7b14
|
able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)
|
2025-10-28 09:02:26 -07:00 |
|
lucidrains
|
789f091c63
|
redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic
|
2025-10-28 08:04:48 -07:00 |
|
lucidrains
|
995b1f64e5
|
handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env
|
2025-10-27 10:14:28 -07:00 |
|
lucidrains
|
46432aee9b
|
fix an issue with bc
|
2025-10-25 12:30:08 -07:00 |
|
lucidrains
|
32cf142b4d
|
take another step for variable len experiences
|
2025-10-25 11:31:41 -07:00 |
|
lucidrains
|
3d5617d769
|
take a step towards variable lengthed experiences during training
|
2025-10-25 10:45:34 -07:00 |
|
lucidrains
|
77a40e8701
|
validate that we can generate multiple video streams for robotics use-case
|
2025-10-25 09:23:07 -07:00 |
|
lucidrains
|
4ce82f34df
|
given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints
|
2025-10-25 09:20:55 -07:00 |
|
lucidrains
|
a9b728c611
|
incorporate proprioception into the dynamics world model
|
2025-10-24 11:24:22 -07:00 |
|
lucidrains
|
35c1db4c7d
|
sketch of training from sim env
|
2025-10-24 09:13:09 -07:00 |
|
lucidrains
|
27ac05efb0
|
function for combining experiences
|
2025-10-24 08:00:10 -07:00 |
|
lucidrains
|
d0ffc6bfed
|
with or without signed advantage
|
2025-10-23 16:24:29 -07:00 |
|
lucidrains
|
fb3e026fe0
|
handle vectorized env
|
2025-10-22 11:19:44 -07:00 |
|
lucidrains
|
d82debb7a6
|
first pass through gathering experience with a mock env for online rl
|
2025-10-22 08:32:46 -07:00 |
|
lucidrains
|
03b16a48f2
|
sketch out the dream trainer, seems like they only fine tune the heads
|
2025-10-22 06:41:10 -07:00 |
|
lucidrains
|
6f1a7a24ed
|
try to fix ci
|
2025-10-21 11:47:39 -07:00 |
|
lucidrains
|
2fc3b17149
|
take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards
|
2025-10-21 10:20:08 -07:00 |
|
lucidrains
|
283d59d75a
|
oops
|
2025-10-21 09:50:07 -07:00 |
|
lucidrains
|
4a5465eeb6
|
fix ci
|
2025-10-21 09:17:53 -07:00 |
|
lucidrains
|
b34128d3d0
|
make sure time kv cache can be passed back in during generation
|
2025-10-21 09:15:32 -07:00 |
|
lucidrains
|
ea13d4fcab
|
take a gradient step with video tokenizer trainer
|
2025-10-21 08:52:22 -07:00 |
|
lucidrains
|
ca244a290c
|
first pass through the kv cache for the time block in the dynamics model
|
2025-10-20 12:25:50 -07:00 |
|
lucidrains
|
374667d8a9
|
take care of the loss normalization mentioned at the end of the first paragraph of section 3
|
2025-10-19 08:24:41 -07:00 |
|
lucidrains
|
b6aa19f31e
|
complete multi-token prediction for actions, tackle loss balancing another day
|
2025-10-18 10:23:14 -07:00 |
|
lucidrains
|
5fc0022bbf
|
the function for generating the MTP targets, as well as the mask for the losses
|
2025-10-18 08:04:51 -07:00 |
|
lucidrains
|
22e13c45fc
|
rename
|
2025-10-17 14:44:25 -07:00 |
|
lucidrains
|
cb416c0d44
|
handle the entropies during policy optimization
|
2025-10-17 08:47:26 -07:00 |
|
lucidrains
|
0dba734280
|
start the learning in dreams portion
|
2025-10-17 08:00:47 -07:00 |
|
lucidrains
|
a0161760a0
|
extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training
|
2025-10-16 10:40:59 -07:00 |
|
lucidrains
|
2d20d0a6c1
|
able to roll out actions from one agent within the dreams of a world model
|
2025-10-16 10:15:43 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
b2725d9b6e
|
complete behavior cloning for one agent
|
2025-10-11 09:24:49 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
|
2025-10-10 10:41:48 -07:00 |
|
lucidrains
|
9101a49cdd
|
handle continuous value normalization if stats passed in
|
2025-10-09 08:59:54 -07:00 |
|