143 Commits

Author SHA1 Message Date
lucidrains
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71 2025-10-25 09:20:55 -07:00
lucidrains
a9b728c611 incorporate proprioception into the dynamics world model 0.0.70 2025-10-24 11:24:22 -07:00
lucidrains
35c1db4c7d sketch of training from sim env 0.0.69 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 0.0.67 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 0.0.66 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 0.0.65 2025-10-22 11:19:44 -07:00
lucidrains
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 0.0.62 2025-10-22 08:39:11 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 0.0.61 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 0.0.60 2025-10-22 06:41:10 -07:00
lucidrains
6f1a7a24ed try to fix ci 2025-10-21 11:47:39 -07:00
lucidrains
e316499047 naming 2025-10-21 10:57:55 -07:00
lucidrains
40da985c6b tweak bc trainer 0.0.59 2025-10-21 10:55:24 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57 2025-10-21 10:20:08 -07:00
lucidrains
283d59d75a oops 2025-10-21 09:50:07 -07:00
lucidrains
4a5465eeb6 fix ci 2025-10-21 09:17:53 -07:00
lucidrains
b34128d3d0 make sure time kv cache can be passed back in during generation 0.0.55 2025-10-21 09:15:32 -07:00
lucidrains
7ba3988fb9 prepare a mock for interacting with online env 2025-10-21 09:03:20 -07:00
lucidrains
ea13d4fcab take a gradient step with video tokenizer trainer 0.0.54 2025-10-21 08:52:22 -07:00
lucidrains
15876d34cf more muon prep 0.0.53 2025-10-21 08:23:59 -07:00
lucidrains
b4763caff9 fix rotary embeddings in presence of kv caching 2025-10-21 07:10:21 -07:00
lucidrains
7195bbb196 oops 0.0.50 2025-10-20 12:42:27 -07:00
lucidrains
ca244a290c first pass through the kv cache for the time block in the dynamics model 0.0.49 2025-10-20 12:25:50 -07:00
lucidrains
a7e0c395c3 allow for only rmsnorm for keys in attention 0.0.48 2025-10-20 11:20:49 -07:00
lucidrains
1345326656 another measure for the attending to nothing issue 0.0.47 2025-10-20 10:32:31 -07:00
lucidrains
55574c054e assert 0.0.46 2025-10-19 09:59:42 -07:00
lucidrains
27ed6d0ba5 fix time kv cache 0.0.45 2025-10-19 09:16:06 -07:00
lucidrains
4930002e99 bit of progress on time kv cache 0.0.44 2025-10-19 09:04:26 -07:00
lucidrains
ecbe13efe8 allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43 2025-10-19 08:37:56 -07:00
lucidrains
f651d779e3 able to control the update of the loss ema from dynamics model forward 0.0.42 2025-10-19 08:25:50 -07:00
lucidrains
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41 2025-10-19 08:24:41 -07:00
lucidrains
79a1b1c46e oops 0.0.40 2025-10-18 10:31:48 -07:00
lucidrains
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 0.0.38 2025-10-18 10:23:14 -07:00
lucidrains
bc629d78b1 inverse norm for continuous actions when sampling 0.0.37 2025-10-18 08:55:04 -07:00
lucidrains
0ee475d2df oops 0.0.36 2025-10-18 08:50:53 -07:00
lucidrains
8c88a33d3b complete multi token prediction for the reward head 0.0.35 2025-10-18 08:33:06 -07:00
lucidrains
911a1a8434 oops 0.0.34 2025-10-18 08:07:06 -07:00
lucidrains
5fc0022bbf the function for generating the MTP targets, as well as the mask for the losses 2025-10-18 08:04:51 -07:00
lucidrains
83cfd2cd1b task conditioning when dreaming 0.0.33 2025-10-18 07:47:13 -07:00
lucidrains
22e13c45fc rename 0.0.32 2025-10-17 14:44:25 -07:00
lucidrains
c967404471 0.0.31 0.0.31 2025-10-17 08:55:42 -07:00
lucidrains
0c1b067f97 if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon 2025-10-17 08:55:20 -07:00
lucidrains
cb416c0d44 handle the entropies during policy optimization 0.0.30 2025-10-17 08:47:26 -07:00
lucidrains
61773c8219 eventually we will need to learn from the outside stream of experience 0.0.29 2025-10-17 08:06:24 -07:00
lucidrains
0dba734280 start the learning in dreams portion 0.0.27 2025-10-17 08:00:47 -07:00
lucidrains
a0161760a0 extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 0.0.26 2025-10-16 10:40:59 -07:00
lucidrains
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 0.0.25 2025-10-16 10:15:43 -07:00
lucidrains
d74f09f0b3 a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 0.0.24 2025-10-16 09:40:14 -07:00
lucidrains
2ccb290e26 pass the attend kwargs for the block causal masking in tokenizer 0.0.23 2025-10-16 08:33:26 -07:00
lucidrains
517ef6b94b oops 0.0.22 2025-10-16 07:03:51 -07:00
lucidrains
ec18bc0fa4 cleanup 2025-10-16 06:44:28 -07:00