160 Commits

Author SHA1 Message Date
lucidrains
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85 2025-10-28 09:02:26 -07:00
lucidrains
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83 2025-10-28 08:04:48 -07:00
lucidrains
41ab83f691 fix mock 2025-10-27 10:47:24 -07:00
lucidrains
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82 2025-10-27 10:14:28 -07:00
lucidrains
fd1e87983b quantile filter 2025-10-27 09:08:26 -07:00
lucidrains
fe79bfa951 optionally keep track of returns statistics and normalize with them before advantage 0.0.81 2025-10-27 09:02:08 -07:00
lucidrains
f808b1c1d2 oops 0.0.80 2025-10-27 08:34:22 -07:00
lucidrains
349a03acd7 redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on 0.0.79 2025-10-27 08:06:21 -07:00
lucidrains
59c458aea3 introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78 2025-10-27 07:55:00 -07:00
lucidrains
fbfd59e42f handle variable lengthed experiences when doing policy optimization 0.0.77 2025-10-27 06:09:09 -07:00
lucidrains
46432aee9b fix an issue with bc 2025-10-25 12:30:08 -07:00
lucidrains
f97d9adc97 oops, forgot to add the view embedding for robotics 0.0.75 2025-10-25 11:39:06 -07:00
lucidrains
32cf142b4d take another step for variable len experiences 0.0.74 2025-10-25 11:31:41 -07:00
lucidrains
1ed6a15cb0 fix tests 2025-10-25 11:13:22 -07:00
lucidrains
4d8f5613cc start storing the experience lens 0.0.73 2025-10-25 10:55:47 -07:00
lucidrains
3d5617d769 take a step towards variable lengthed experiences during training 0.0.72 2025-10-25 10:45:34 -07:00
lucidrains
77a40e8701 validate that we can generate multiple video streams for robotics use-case 2025-10-25 09:23:07 -07:00
lucidrains
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71 2025-10-25 09:20:55 -07:00
lucidrains
a9b728c611 incorporate proprioception into the dynamics world model 0.0.70 2025-10-24 11:24:22 -07:00
lucidrains
35c1db4c7d sketch of training from sim env 0.0.69 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 0.0.67 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 0.0.66 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 0.0.65 2025-10-22 11:19:44 -07:00
lucidrains
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 0.0.62 2025-10-22 08:39:11 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 0.0.61 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 0.0.60 2025-10-22 06:41:10 -07:00
lucidrains
6f1a7a24ed try to fix ci 2025-10-21 11:47:39 -07:00
lucidrains
e316499047 naming 2025-10-21 10:57:55 -07:00
lucidrains
40da985c6b tweak bc trainer 0.0.59 2025-10-21 10:55:24 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57 2025-10-21 10:20:08 -07:00
lucidrains
283d59d75a oops 2025-10-21 09:50:07 -07:00
lucidrains
4a5465eeb6 fix ci 2025-10-21 09:17:53 -07:00
lucidrains
b34128d3d0 make sure time kv cache can be passed back in during generation 0.0.55 2025-10-21 09:15:32 -07:00
lucidrains
7ba3988fb9 prepare a mock for interacting with online env 2025-10-21 09:03:20 -07:00
lucidrains
ea13d4fcab take a gradient step with video tokenizer trainer 0.0.54 2025-10-21 08:52:22 -07:00
lucidrains
15876d34cf more muon prep 0.0.53 2025-10-21 08:23:59 -07:00
lucidrains
b4763caff9 fix rotary embeddings in presence of kv caching 2025-10-21 07:10:21 -07:00
lucidrains
7195bbb196 oops 0.0.50 2025-10-20 12:42:27 -07:00
lucidrains
ca244a290c first pass through the kv cache for the time block in the dynamics model 0.0.49 2025-10-20 12:25:50 -07:00
lucidrains
a7e0c395c3 allow for only rmsnorm for keys in attention 0.0.48 2025-10-20 11:20:49 -07:00
lucidrains
1345326656 another measure for the attending to nothing issue 0.0.47 2025-10-20 10:32:31 -07:00
lucidrains
55574c054e assert 0.0.46 2025-10-19 09:59:42 -07:00
lucidrains
27ed6d0ba5 fix time kv cache 0.0.45 2025-10-19 09:16:06 -07:00
lucidrains
4930002e99 bit of progress on time kv cache 0.0.44 2025-10-19 09:04:26 -07:00
lucidrains
ecbe13efe8 allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43 2025-10-19 08:37:56 -07:00
lucidrains
f651d779e3 able to control the update of the loss ema from dynamics model forward 0.0.42 2025-10-19 08:25:50 -07:00
lucidrains
374667d8a9 take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41 2025-10-19 08:24:41 -07:00
lucidrains
79a1b1c46e oops 0.0.40 2025-10-18 10:31:48 -07:00
lucidrains
b6aa19f31e complete multi-token prediction for actions, tackle loss balancing another day 0.0.38 2025-10-18 10:23:14 -07:00
lucidrains
bc629d78b1 inverse norm for continuous actions when sampling 0.0.37 2025-10-18 08:55:04 -07:00