180 Commits

Author SHA1 Message Date
j
b0f6b8583d fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug. 2025-11-04 17:29:12 -05:00
lucidrains
c0a6cd56a1 link to new discord 2025-10-31 09:06:44 -07:00
lucidrains
d756d1bb8c addressing issues raised by an independent researcher with llm assistance 0.1.2 2025-10-31 08:37:39 -07:00
lucidrains
60681fce1d fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme 2025-10-31 06:48:49 -07:00
Phil Wang
6870294d95
no longer needed 2025-10-30 09:23:27 -07:00
lucidrains
3beae186da some more control over whether to normalize advantages 0.0.102 2025-10-30 08:46:03 -07:00
lucidrains
0904e224ab make the reverse kl optional 0.0.101 2025-10-30 08:22:50 -07:00
lucidrains
767789d0ca they decided on 0.3 for the behavioral prior loss weight 0.0.100 2025-10-29 13:24:58 -07:00
lucidrains
35b87c4fa1 oops 0.0.99 2025-10-29 13:04:02 -07:00
lucidrains
c4a3cb09d5 swap for discrete kl div, thanks to Dirk for pointing this out on the discord 0.0.98 2025-10-29 11:54:18 -07:00
lucidrains
cb54121ace sim trainer needs to take care of agent embedding and old actions 0.0.96 2025-10-29 11:15:11 -07:00
lucidrains
586379f2c8 sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95 2025-10-29 10:46:42 -07:00
lucidrains
a358a44a53 always store old agent embeds and old action parameters when possible 0.0.94 2025-10-29 10:39:15 -07:00
lucidrains
3547344312 take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93 2025-10-29 10:31:32 -07:00
lucidrains
691d9ca007 add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92 2025-10-29 10:02:25 -07:00
lucidrains
91d697f8ca fix pmpo 0.0.91 2025-10-28 18:55:22 -07:00
lucidrains
7acaa764f6 evolutionary policy optimization on dreams will be interesting 0.0.90 2025-10-28 10:17:01 -07:00
lucidrains
c0450359f3 allow for evolutionary policy optimization 0.0.89 2025-10-28 10:11:13 -07:00
lucidrains
46f86cd247 fix storing of agent embedding 0.0.88 2025-10-28 09:36:58 -07:00
lucidrains
903c43b770 use the agent embeds off the stored experience if available 0.0.87 2025-10-28 09:14:02 -07:00
lucidrains
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85 2025-10-28 09:02:26 -07:00
lucidrains
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83 2025-10-28 08:04:48 -07:00
lucidrains
41ab83f691 fix mock 2025-10-27 10:47:24 -07:00
lucidrains
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82 2025-10-27 10:14:28 -07:00
lucidrains
fd1e87983b quantile filter 2025-10-27 09:08:26 -07:00
lucidrains
fe79bfa951 optionally keep track of returns statistics and normalize with them before advantage 0.0.81 2025-10-27 09:02:08 -07:00
lucidrains
f808b1c1d2 oops 0.0.80 2025-10-27 08:34:22 -07:00
lucidrains
349a03acd7 redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on 0.0.79 2025-10-27 08:06:21 -07:00
lucidrains
59c458aea3 introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78 2025-10-27 07:55:00 -07:00
lucidrains
fbfd59e42f handle variable lengthed experiences when doing policy optimization 0.0.77 2025-10-27 06:09:09 -07:00
lucidrains
46432aee9b fix an issue with bc 2025-10-25 12:30:08 -07:00
lucidrains
f97d9adc97 oops, forgot to add the view embedding for robotics 0.0.75 2025-10-25 11:39:06 -07:00
lucidrains
32cf142b4d take another step for variable len experiences 0.0.74 2025-10-25 11:31:41 -07:00
lucidrains
1ed6a15cb0 fix tests 2025-10-25 11:13:22 -07:00
lucidrains
4d8f5613cc start storing the experience lens 0.0.73 2025-10-25 10:55:47 -07:00
lucidrains
3d5617d769 take a step towards variable lengthed experiences during training 0.0.72 2025-10-25 10:45:34 -07:00
lucidrains
77a40e8701 validate that we can generate multiple video streams for robotics use-case 2025-10-25 09:23:07 -07:00
lucidrains
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71 2025-10-25 09:20:55 -07:00
lucidrains
a9b728c611 incorporate proprioception into the dynamics world model 0.0.70 2025-10-24 11:24:22 -07:00
lucidrains
35c1db4c7d sketch of training from sim env 0.0.69 2025-10-24 09:13:09 -07:00
lucidrains
27ac05efb0 function for combining experiences 0.0.67 2025-10-24 08:00:10 -07:00
lucidrains
d0ffc6bfed with or without signed advantage 0.0.66 2025-10-23 16:24:29 -07:00
lucidrains
fb3e026fe0 handle vectorized env 0.0.65 2025-10-22 11:19:44 -07:00
lucidrains
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 0.0.62 2025-10-22 08:39:11 -07:00
lucidrains
d82debb7a6 first pass through gathering experience with a mock env for online rl 0.0.61 2025-10-22 08:32:46 -07:00
lucidrains
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 0.0.60 2025-10-22 06:41:10 -07:00
lucidrains
6f1a7a24ed try to fix ci 2025-10-21 11:47:39 -07:00
lucidrains
e316499047 naming 2025-10-21 10:57:55 -07:00
lucidrains
40da985c6b tweak bc trainer 0.0.59 2025-10-21 10:55:24 -07:00
lucidrains
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57 2025-10-21 10:20:08 -07:00