105 Commits

Author SHA1 Message Date
lucidrains
cfd34f1eba able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it 2025-11-09 16:16:13 +00:00
lucidrains
24ef72d528 0.1.4 2025-11-04 15:21:20 -08:00
lucidrains
d756d1bb8c addressing issues raised by an independent researcher with llm assistance 2025-10-31 08:37:39 -07:00
lucidrains
60681fce1d fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme 2025-10-31 06:48:49 -07:00
lucidrains
3beae186da some more control over whether to normalize advantages 2025-10-30 08:46:03 -07:00
lucidrains
0904e224ab make the reverse kl optional 2025-10-30 08:22:50 -07:00
lucidrains
767789d0ca they decided on 0.3 for the behavioral prior loss weight 2025-10-29 13:24:58 -07:00
lucidrains
35b87c4fa1 oops 2025-10-29 13:04:02 -07:00
lucidrains
c4a3cb09d5 swap for discrete kl div, thanks to Dirk for pointing this out on the discord 2025-10-29 11:54:18 -07:00
lucidrains
cb54121ace sim trainer needs to take care of agent embedding and old actions 2025-10-29 11:15:11 -07:00
lucidrains
586379f2c8 sum the kl div loss across number of actions by default for action embedder .kl_div 2025-10-29 10:46:42 -07:00
lucidrains
a358a44a53 always store old agent embeds and old action parameters when possible 2025-10-29 10:39:15 -07:00
lucidrains
3547344312 take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 2025-10-29 10:31:32 -07:00
lucidrains
691d9ca007 add kl div on action embedder, working way towards the kl div loss in pmpo 2025-10-29 10:02:25 -07:00
lucidrains
91d697f8ca fix pmpo 2025-10-28 18:55:22 -07:00
lucidrains
7acaa764f6 evolutionary policy optimization on dreams will be interesting 2025-10-28 10:17:01 -07:00
lucidrains
c0450359f3 allow for evolutionary policy optimization 2025-10-28 10:11:13 -07:00
lucidrains
46f86cd247 fix storing of agent embedding 2025-10-28 09:36:58 -07:00
lucidrains
903c43b770 use the agent embeds off the stored experience if available 2025-10-28 09:14:02 -07:00
lucidrains
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 2025-10-28 09:02:26 -07:00
lucidrains
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 2025-10-28 08:04:48 -07:00
lucidrains
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 2025-10-27 10:14:28 -07:00
lucidrains
fe79bfa951 optionally keep track of returns statistics and normalize with them before advantage 2025-10-27 09:02:08 -07:00
lucidrains
f808b1c1d2 oops 2025-10-27 08:34:22 -07:00
lucidrains
349a03acd7 redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on 2025-10-27 08:06:21 -07:00
lucidrains
59c458aea3 introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately 2025-10-27 07:55:00 -07:00
lucidrains
fbfd59e42f handle variable lengthed experiences when doing policy optimization 2025-10-27 06:09:09 -07:00
lucidrains
46432aee9b fix an issue with bc 2025-10-25 12:30:08 -07:00
lucidrains
f97d9adc97 oops, forgot to add the view embedding for robotics 2025-10-25 11:39:06 -07:00
lucidrains
32cf142b4d take another step for variable len experiences 2025-10-25 11:31:41 -07:00
lucidrains
4d8f5613cc start storing the experience lens 2025-10-25 10:55:47 -07:00
lucidrains
3d5617d769 take a step towards variable lengthed experiences during training 2025-10-25 10:45:34 -07:00
lucidrains
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 2025-10-25 09:20:55 -07:00
lucidrains
a9b728c611 incorporate proprioception into the dynamics world model 2025-10-24 11:24:22 -07:00
lucidrains
35c1db4c7d sketch of training from sim env 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 2025-10-22 11:19:44 -07:00
lucidrains
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 2025-10-22 08:39:11 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 2025-10-22 06:41:10 -07:00
lucidrains
40da985c6b tweak bc trainer 2025-10-21 10:55:24 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 2025-10-21 10:20:08 -07:00
lucidrains
283d59d75a oops 2025-10-21 09:50:07 -07:00
lucidrains
b34128d3d0 make sure time kv cache can be passed back in during generation 2025-10-21 09:15:32 -07:00
lucidrains
7ba3988fb9 prepare a mock for interacting with online env 2025-10-21 09:03:20 -07:00
lucidrains
ea13d4fcab take a gradient step with video tokenizer trainer 2025-10-21 08:52:22 -07:00
lucidrains
15876d34cf more muon prep 2025-10-21 08:23:59 -07:00
lucidrains
b4763caff9 fix rotary embeddings in presence of kv caching 2025-10-21 07:10:21 -07:00
lucidrains
7195bbb196 oops 2025-10-20 12:42:27 -07:00