200 Commits

Author SHA1 Message Date
lucidrains
fb6d69f43a complete the latent autoregressive prediction, to use the log variance as a state entropy bonus 0.1.21 2025-12-03 06:40:19 -08:00
lucidrains
125693ce1c add a separate state prediction head for the state entropy 2025-12-02 15:58:25 -08:00
lucidrains
2e7f406d49 allow for the combining of experiences from environment and dream 0.1.20 2025-11-13 16:37:35 -08:00
lucidrains
690ecf07dc fix the rnn time caching issue 0.1.19 2025-11-11 17:04:02 -08:00
lucidrains
ac1c12f743 disable until rnn hiddens are handled properly 0.1.18 2025-11-10 15:52:43 -08:00
lucidrains
3c84b404a8 rnn layer needs to be hyper connected too 0.1.17 2025-11-10 15:51:33 -08:00
lucidrains
d5b70e2b86 allow for adding an RNN before time attention, but need to handle caching still 0.1.16 2025-11-10 11:42:20 -08:00
lucidrains
c3532fa797 add learned value residual 0.1.15 2025-11-10 09:33:58 -08:00
lucidrains
73029635fe last commit for the day 0.1.12 2025-11-09 11:12:37 -08:00
lucidrains
e1c41f4371 decorrelation loss for spatial attention as well 0.1.10 2025-11-09 10:41:58 -08:00
Phil Wang
f55c61c6cf
cleanup 2025-11-09 10:22:47 -08:00
lucidrains
051d4d6ee2 oops 0.1.8 2025-11-09 10:12:51 -08:00
lucidrains
ef3a5552e7 eventually video tokenizer may need to be trained on single frames 0.1.7 2025-11-09 10:11:56 -08:00
lucidrains
0c4224da18 add a decorrelation loss for temporal attention in encoder of video tokenizer 0.1.6 2025-11-09 09:47:47 -08:00
Phil Wang
256a81f658
Merge pull request #5 from Cycl0/patch-1
Update Discord channel link in README to use permanent link
2025-11-09 08:17:41 -08:00
lucidrains
cfd34f1eba able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it 0.1.5 2025-11-09 16:16:13 +00:00
Lucas Kenzo Cyra
4ffbe37873
Update Discord channel link in README to use permanent link
Updated Discord channel link for collaboration.
2025-11-09 10:12:45 -03:00
lucidrains
24ef72d528 0.1.4 0.1.4 2025-11-04 15:21:20 -08:00
Phil Wang
a4afcb22a6
Merge pull request #4 from dirkmcpherson/bugfix
fix a few typo bugs. Support info in return signature of environment …
2025-11-04 15:19:25 -08:00
j
b0f6b8583d fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug. 2025-11-04 17:29:12 -05:00
lucidrains
38cba80068 readme 2025-11-04 06:05:11 -08:00
lucidrains
c0a6cd56a1 link to new discord 2025-10-31 09:06:44 -07:00
lucidrains
d756d1bb8c addressing issues raised by an independent researcher with llm assistance 0.1.2 2025-10-31 08:37:39 -07:00
lucidrains
60681fce1d fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme 2025-10-31 06:48:49 -07:00
Phil Wang
6870294d95
no longer needed 2025-10-30 09:23:27 -07:00
lucidrains
3beae186da some more control over whether to normalize advantages 0.0.102 2025-10-30 08:46:03 -07:00
lucidrains
0904e224ab make the reverse kl optional 0.0.101 2025-10-30 08:22:50 -07:00
lucidrains
767789d0ca they decided on 0.3 for the behavioral prior loss weight 0.0.100 2025-10-29 13:24:58 -07:00
lucidrains
35b87c4fa1 oops 0.0.99 2025-10-29 13:04:02 -07:00
lucidrains
c4a3cb09d5 swap for discrete kl div, thanks to Dirk for pointing this out on the discord 0.0.98 2025-10-29 11:54:18 -07:00
lucidrains
cb54121ace sim trainer needs to take care of agent embedding and old actions 0.0.96 2025-10-29 11:15:11 -07:00
lucidrains
586379f2c8 sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95 2025-10-29 10:46:42 -07:00
lucidrains
a358a44a53 always store old agent embeds and old action parameters when possible 0.0.94 2025-10-29 10:39:15 -07:00
lucidrains
3547344312 take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93 2025-10-29 10:31:32 -07:00
lucidrains
691d9ca007 add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92 2025-10-29 10:02:25 -07:00
lucidrains
91d697f8ca fix pmpo 0.0.91 2025-10-28 18:55:22 -07:00
lucidrains
7acaa764f6 evolutionary policy optimization on dreams will be interesting 0.0.90 2025-10-28 10:17:01 -07:00
lucidrains
c0450359f3 allow for evolutionary policy optimization 0.0.89 2025-10-28 10:11:13 -07:00
lucidrains
46f86cd247 fix storing of agent embedding 0.0.88 2025-10-28 09:36:58 -07:00
lucidrains
903c43b770 use the agent embeds off the stored experience if available 0.0.87 2025-10-28 09:14:02 -07:00
lucidrains
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85 2025-10-28 09:02:26 -07:00
lucidrains
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83 2025-10-28 08:04:48 -07:00
lucidrains
41ab83f691 fix mock 2025-10-27 10:47:24 -07:00
lucidrains
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82 2025-10-27 10:14:28 -07:00
lucidrains
fd1e87983b quantile filter 2025-10-27 09:08:26 -07:00
lucidrains
fe79bfa951 optionally keep track of returns statistics and normalize with them before advantage 0.0.81 2025-10-27 09:02:08 -07:00
lucidrains
f808b1c1d2 oops 0.0.80 2025-10-27 08:34:22 -07:00
lucidrains
349a03acd7 redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on 0.0.79 2025-10-27 08:06:21 -07:00
lucidrains
59c458aea3 introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78 2025-10-27 07:55:00 -07:00
lucidrains
fbfd59e42f handle variable lengthed experiences when doing policy optimization 0.0.77 2025-10-27 06:09:09 -07:00