lucidrains
|
fb6d69f43a
|
complete the latent autoregressive prediction, to use the log variance as a state entropy bonus
0.1.21
|
2025-12-03 06:40:19 -08:00 |
|
lucidrains
|
125693ce1c
|
add a separate state prediction head for the state entropy
|
2025-12-02 15:58:25 -08:00 |
|
lucidrains
|
2e7f406d49
|
allow for the combining of experiences from environment and dream
0.1.20
|
2025-11-13 16:37:35 -08:00 |
|
lucidrains
|
690ecf07dc
|
fix the rnn time caching issue
0.1.19
|
2025-11-11 17:04:02 -08:00 |
|
lucidrains
|
ac1c12f743
|
disable until rnn hiddens are handled properly
0.1.18
|
2025-11-10 15:52:43 -08:00 |
|
lucidrains
|
3c84b404a8
|
rnn layer needs to be hyper connected too
0.1.17
|
2025-11-10 15:51:33 -08:00 |
|
lucidrains
|
d5b70e2b86
|
allow for adding an RNN before time attention, but need to handle caching still
0.1.16
|
2025-11-10 11:42:20 -08:00 |
|
lucidrains
|
c3532fa797
|
add learned value residual
0.1.15
|
2025-11-10 09:33:58 -08:00 |
|
lucidrains
|
73029635fe
|
last commit for the day
0.1.12
|
2025-11-09 11:12:37 -08:00 |
|
lucidrains
|
e1c41f4371
|
decorrelation loss for spatial attention as well
0.1.10
|
2025-11-09 10:41:58 -08:00 |
|
Phil Wang
|
f55c61c6cf
|
cleanup
|
2025-11-09 10:22:47 -08:00 |
|
lucidrains
|
051d4d6ee2
|
oops
0.1.8
|
2025-11-09 10:12:51 -08:00 |
|
lucidrains
|
ef3a5552e7
|
eventually video tokenizer may need to be trained on single frames
0.1.7
|
2025-11-09 10:11:56 -08:00 |
|
lucidrains
|
0c4224da18
|
add a decorrelation loss for temporal attention in encoder of video tokenizer
0.1.6
|
2025-11-09 09:47:47 -08:00 |
|
Phil Wang
|
256a81f658
|
Merge pull request #5 from Cycl0/patch-1
Update Discord channel link in README to use permanent link
|
2025-11-09 08:17:41 -08:00 |
|
lucidrains
|
cfd34f1eba
|
able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it
0.1.5
|
2025-11-09 16:16:13 +00:00 |
|
Lucas Kenzo Cyra
|
4ffbe37873
|
Update Discord channel link in README to use permanent link
Updated Discord channel link for collaboration.
|
2025-11-09 10:12:45 -03:00 |
|
lucidrains
|
24ef72d528
|
0.1.4
0.1.4
|
2025-11-04 15:21:20 -08:00 |
|
Phil Wang
|
a4afcb22a6
|
Merge pull request #4 from dirkmcpherson/bugfix
fix a few typo bugs. Support info in return signature of environment …
|
2025-11-04 15:19:25 -08:00 |
|
j
|
b0f6b8583d
|
fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug.
|
2025-11-04 17:29:12 -05:00 |
|
lucidrains
|
38cba80068
|
readme
|
2025-11-04 06:05:11 -08:00 |
|
lucidrains
|
c0a6cd56a1
|
link to new discord
|
2025-10-31 09:06:44 -07:00 |
|
lucidrains
|
d756d1bb8c
|
addressing issues raised by an independent researcher with llm assistance
0.1.2
|
2025-10-31 08:37:39 -07:00 |
|
lucidrains
|
60681fce1d
|
fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme
|
2025-10-31 06:48:49 -07:00 |
|
Phil Wang
|
6870294d95
|
no longer needed
|
2025-10-30 09:23:27 -07:00 |
|
lucidrains
|
3beae186da
|
some more control over whether to normalize advantages
0.0.102
|
2025-10-30 08:46:03 -07:00 |
|
lucidrains
|
0904e224ab
|
make the reverse kl optional
0.0.101
|
2025-10-30 08:22:50 -07:00 |
|
lucidrains
|
767789d0ca
|
they decided on 0.3 for the behavioral prior loss weight
0.0.100
|
2025-10-29 13:24:58 -07:00 |
|
lucidrains
|
35b87c4fa1
|
oops
0.0.99
|
2025-10-29 13:04:02 -07:00 |
|
lucidrains
|
c4a3cb09d5
|
swap for discrete kl div, thanks to Dirk for pointing this out on the discord
0.0.98
|
2025-10-29 11:54:18 -07:00 |
|
lucidrains
|
cb54121ace
|
sim trainer needs to take care of agent embedding and old actions
0.0.96
|
2025-10-29 11:15:11 -07:00 |
|
lucidrains
|
586379f2c8
|
sum the kl div loss across number of actions by default for action embedder .kl_div
0.0.95
|
2025-10-29 10:46:42 -07:00 |
|
lucidrains
|
a358a44a53
|
always store old agent embeds and old action parameters when possible
0.0.94
|
2025-10-29 10:39:15 -07:00 |
|
lucidrains
|
3547344312
|
take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience
0.0.93
|
2025-10-29 10:31:32 -07:00 |
|
lucidrains
|
691d9ca007
|
add kl div on action embedder, working way towards the kl div loss in pmpo
0.0.92
|
2025-10-29 10:02:25 -07:00 |
|
lucidrains
|
91d697f8ca
|
fix pmpo
0.0.91
|
2025-10-28 18:55:22 -07:00 |
|
lucidrains
|
7acaa764f6
|
evolutionary policy optimization on dreams will be interesting
0.0.90
|
2025-10-28 10:17:01 -07:00 |
|
lucidrains
|
c0450359f3
|
allow for evolutionary policy optimization
0.0.89
|
2025-10-28 10:11:13 -07:00 |
|
lucidrains
|
46f86cd247
|
fix storing of agent embedding
0.0.88
|
2025-10-28 09:36:58 -07:00 |
|
lucidrains
|
903c43b770
|
use the agent embeds off the stored experience if available
0.0.87
|
2025-10-28 09:14:02 -07:00 |
|
lucidrains
|
d476fa7b14
|
able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)
0.0.85
|
2025-10-28 09:02:26 -07:00 |
|
lucidrains
|
789f091c63
|
redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic
0.0.83
|
2025-10-28 08:04:48 -07:00 |
|
lucidrains
|
41ab83f691
|
fix mock
|
2025-10-27 10:47:24 -07:00 |
|
lucidrains
|
995b1f64e5
|
handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env
0.0.82
|
2025-10-27 10:14:28 -07:00 |
|
lucidrains
|
fd1e87983b
|
quantile filter
|
2025-10-27 09:08:26 -07:00 |
|
lucidrains
|
fe79bfa951
|
optionally keep track of returns statistics and normalize with them before advantage
0.0.81
|
2025-10-27 09:02:08 -07:00 |
|
lucidrains
|
f808b1c1d2
|
oops
0.0.80
|
2025-10-27 08:34:22 -07:00 |
|
lucidrains
|
349a03acd7
|
redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on
0.0.79
|
2025-10-27 08:06:21 -07:00 |
|
lucidrains
|
59c458aea3
|
introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately
0.0.78
|
2025-10-27 07:55:00 -07:00 |
|
lucidrains
|
fbfd59e42f
|
handle variable lengthed experiences when doing policy optimization
0.0.77
|
2025-10-27 06:09:09 -07:00 |
|