141 Commits

Author SHA1 Message Date
lucidrains
35c1db4c7d sketch of training from sim env 0.0.69 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 0.0.67 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 0.0.66 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 0.0.65 2025-10-22 11:19:44 -07:00
lucidrains
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 0.0.62 2025-10-22 08:39:11 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 0.0.61 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 0.0.60 2025-10-22 06:41:10 -07:00
lucidrains
6f1a7a24ed try to fix ci 2025-10-21 11:47:39 -07:00
lucidrains
e316499047 naming 2025-10-21 10:57:55 -07:00
lucidrains
40da985c6b tweak bc trainer 0.0.59 2025-10-21 10:55:24 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57 2025-10-21 10:20:08 -07:00
lucidrains
283d59d75a oops 2025-10-21 09:50:07 -07:00
lucidrains
4a5465eeb6 fix ci 2025-10-21 09:17:53 -07:00
lucidrains
b34128d3d0 make sure time kv cache can be passed back in during generation 0.0.55 2025-10-21 09:15:32 -07:00
lucidrains
7ba3988fb9 prepare a mock for interacting with online env 2025-10-21 09:03:20 -07:00
lucidrains
ea13d4fcab take a gradient step with video tokenizer trainer 0.0.54 2025-10-21 08:52:22 -07:00
lucidrains
15876d34cf more muon prep 0.0.53 2025-10-21 08:23:59 -07:00
lucidrains
b4763caff9 fix rotary embeddings in presence of kv caching 2025-10-21 07:10:21 -07:00
lucidrains
7195bbb196 oops 0.0.50 2025-10-20 12:42:27 -07:00
lucidrains
ca244a290c first pass through the kv cache for the time block in the dynamics model 0.0.49 2025-10-20 12:25:50 -07:00
lucidrains
a7e0c395c3 allow for only rmsnorm for keys in attention 0.0.48 2025-10-20 11:20:49 -07:00
lucidrains
1345326656 another measure for the attending to nothing issue 0.0.47 2025-10-20 10:32:31 -07:00
lucidrains
55574c054e assert 0.0.46 2025-10-19 09:59:42 -07:00
lucidrains
27ed6d0ba5 fix time kv cache 0.0.45 2025-10-19 09:16:06 -07:00
lucidrains
4930002e99 bit of progress on time kv cache 0.0.44 2025-10-19 09:04:26 -07:00
lucidrains
ecbe13efe8 allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43 2025-10-19 08:37:56 -07:00
lucidrains
f651d779e3 able to control the update of the loss ema from dynamics model forward 0.0.42 2025-10-19 08:25:50 -07:00
lucidrains
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41 2025-10-19 08:24:41 -07:00
lucidrains
79a1b1c46e oops 0.0.40 2025-10-18 10:31:48 -07:00
lucidrains
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 0.0.38 2025-10-18 10:23:14 -07:00
lucidrains
bc629d78b1 inverse norm for continuous actions when sampling 0.0.37 2025-10-18 08:55:04 -07:00
lucidrains
0ee475d2df oops 0.0.36 2025-10-18 08:50:53 -07:00
lucidrains
8c88a33d3b complete multi token prediction for the reward head 0.0.35 2025-10-18 08:33:06 -07:00
lucidrains
911a1a8434 oops 0.0.34 2025-10-18 08:07:06 -07:00
lucidrains
5fc0022bbf the function for generating the MTP targets, as well as the mask for the losses 2025-10-18 08:04:51 -07:00
lucidrains
83cfd2cd1b task conditioning when dreaming 0.0.33 2025-10-18 07:47:13 -07:00
lucidrains
22e13c45fc rename 0.0.32 2025-10-17 14:44:25 -07:00
lucidrains
c967404471 0.0.31 0.0.31 2025-10-17 08:55:42 -07:00
lucidrains
0c1b067f97 if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon 2025-10-17 08:55:20 -07:00
lucidrains
cb416c0d44 handle the entropies during policy optimization 0.0.30 2025-10-17 08:47:26 -07:00
lucidrains
61773c8219 eventually we will need to learn from the outside stream of experience 0.0.29 2025-10-17 08:06:24 -07:00
lucidrains
0dba734280 start the learning in dreams portion 0.0.27 2025-10-17 08:00:47 -07:00
lucidrains
a0161760a0 extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 0.0.26 2025-10-16 10:40:59 -07:00
lucidrains
2d20d0a6c1 able to roll out actions from one agent within the dreams of a world model 0.0.25 2025-10-16 10:15:43 -07:00
lucidrains
d74f09f0b3 a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 0.0.24 2025-10-16 09:40:14 -07:00
lucidrains
2ccb290e26 pass the attend kwargs for the block causal masking in tokenizer 0.0.23 2025-10-16 08:33:26 -07:00
lucidrains
517ef6b94b oops 0.0.22 2025-10-16 07:03:51 -07:00
lucidrains
ec18bc0fa4 cleanup 2025-10-16 06:44:28 -07:00
lucidrains
2a902eaaf7 allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it 0.0.21 2025-10-16 06:41:02 -07:00
lucidrains
d28251e9f9 another consideration before knocking out the RL logic 0.0.20 2025-10-14 11:10:26 -07:00