77 Commits

Author SHA1 Message Date
lucidrains
5bb027b386 allow for image pretraining on video tokenizer 2025-12-04 10:34:15 -08:00
lucidrains
fb8c3793b4 complete the addition of a state entropy bonus 2025-12-03 07:52:30 -08:00
lucidrains
2e7f406d49 allow for the combining of experiences from environment and dream 2025-11-13 16:37:35 -08:00
lucidrains
690ecf07dc fix the rnn time caching issue 2025-11-11 17:04:02 -08:00
lucidrains
c3532fa797 add learned value residual 2025-11-10 09:33:58 -08:00
lucidrains
0c4224da18 add a decorrelation loss for temporal attention in encoder of video tokenizer 2025-11-09 09:47:47 -08:00
lucidrains
cfd34f1eba able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it 2025-11-09 16:16:13 +00:00
lucidrains
586379f2c8 sum the kl div loss across number of actions by default for action embedder .kl_div 2025-10-29 10:46:42 -07:00
lucidrains
691d9ca007 add kl div on action embedder, working way towards the kl div loss in pmpo 2025-10-29 10:02:25 -07:00
lucidrains
91d697f8ca fix pmpo 2025-10-28 18:55:22 -07:00
lucidrains
c0450359f3 allow for evolutionary policy optimization 2025-10-28 10:11:13 -07:00
lucidrains
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 2025-10-28 09:02:26 -07:00
lucidrains
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 2025-10-28 08:04:48 -07:00
lucidrains
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 2025-10-27 10:14:28 -07:00
lucidrains
46432aee9b fix an issue with bc 2025-10-25 12:30:08 -07:00
lucidrains
32cf142b4d take another step for variable len experiences 2025-10-25 11:31:41 -07:00
lucidrains
3d5617d769 take a step towards variable lengthed experiences during training 2025-10-25 10:45:34 -07:00
lucidrains
77a40e8701 validate that we can generate multiple video streams for robotics use-case 2025-10-25 09:23:07 -07:00
lucidrains
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 2025-10-25 09:20:55 -07:00
lucidrains
a9b728c611 incorporate proprioception into the dynamics world model 2025-10-24 11:24:22 -07:00
lucidrains
35c1db4c7d sketch of training from sim env 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 2025-10-22 11:19:44 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 2025-10-22 06:41:10 -07:00
lucidrains
6f1a7a24ed try to fix ci 2025-10-21 11:47:39 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 2025-10-21 10:20:08 -07:00
lucidrains
283d59d75a oops 2025-10-21 09:50:07 -07:00
lucidrains
4a5465eeb6 fix ci 2025-10-21 09:17:53 -07:00
lucidrains
b34128d3d0 make sure time kv cache can be passed back in during generation 2025-10-21 09:15:32 -07:00
lucidrains
ea13d4fcab take a gradient step with video tokenizer trainer 2025-10-21 08:52:22 -07:00
lucidrains
ca244a290c first pass through the kv cache for the time block in the dynamics model 2025-10-20 12:25:50 -07:00
lucidrains
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 2025-10-19 08:24:41 -07:00
lucidrains
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 2025-10-18 10:23:14 -07:00
lucidrains
5fc0022bbf the function for generating the MTP targets, as well as the mask for the losses 2025-10-18 08:04:51 -07:00
lucidrains
22e13c45fc rename 2025-10-17 14:44:25 -07:00
lucidrains
cb416c0d44 handle the entropies during policy optimization 2025-10-17 08:47:26 -07:00
lucidrains
0dba734280 start the learning in dreams portion 2025-10-17 08:00:47 -07:00
lucidrains
a0161760a0 extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 2025-10-16 10:40:59 -07:00
lucidrains
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 2025-10-16 10:15:43 -07:00
lucidrains
d28251e9f9 another consideration before knocking out the RL logic 2025-10-14 11:10:26 -07:00
lucidrains
9c78962736 sampling actions 2025-10-12 11:27:12 -07:00
lucidrains
8a73a27fc7 add nested tensor way for getting log prob of multiple discrete actions 2025-10-11 10:53:24 -07:00
lucidrains
b2725d9b6e complete behavior cloning for one agent 2025-10-11 09:24:49 -07:00
lucidrains
563b269f8a bring in hyper connections 2025-10-11 06:52:57 -07:00
lucidrains
5df3e69583 last commit for the day 2025-10-10 11:59:18 -07:00
lucidrains
9230267d34 handle subset of discrete action unembedding 2025-10-10 11:27:05 -07:00
lucidrains
32aa355e37 prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 2025-10-10 10:41:48 -07:00
lucidrains
9101a49cdd handle continuous value normalization if stats passed in 2025-10-09 08:59:54 -07:00