lucidrains
|
55574c054e
|
assert
0.0.46
|
2025-10-19 09:59:42 -07:00 |
|
lucidrains
|
27ed6d0ba5
|
fix time kv cache
0.0.45
|
2025-10-19 09:16:06 -07:00 |
|
lucidrains
|
4930002e99
|
bit of progress on time kv cache
0.0.44
|
2025-10-19 09:04:26 -07:00 |
|
lucidrains
|
ecbe13efe8
|
allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction)
0.0.43
|
2025-10-19 08:37:56 -07:00 |
|
lucidrains
|
f651d779e3
|
able to control the update of the loss ema from dynamics model forward
0.0.42
|
2025-10-19 08:25:50 -07:00 |
|
lucidrains
|
374667d8a9
|
take care of the loss normalization mentioned at the end of the first paragraph of section 3
0.0.41
|
2025-10-19 08:24:41 -07:00 |
|
lucidrains
|
79a1b1c46e
|
oops
0.0.40
|
2025-10-18 10:31:48 -07:00 |
|
lucidrains
|
b6aa19f31e
|
complete multi-token prediction for actions, tackle loss balancing another day
0.0.38
|
2025-10-18 10:23:14 -07:00 |
|
lucidrains
|
bc629d78b1
|
inverse norm for continuous actions when sampling
0.0.37
|
2025-10-18 08:55:04 -07:00 |
|
lucidrains
|
0ee475d2df
|
oops
0.0.36
|
2025-10-18 08:50:53 -07:00 |
|
lucidrains
|
8c88a33d3b
|
complete multi token prediction for the reward head
0.0.35
|
2025-10-18 08:33:06 -07:00 |
|
lucidrains
|
911a1a8434
|
oops
0.0.34
|
2025-10-18 08:07:06 -07:00 |
|
lucidrains
|
5fc0022bbf
|
the function for generating the MTP targets, as well as the mask for the losses
|
2025-10-18 08:04:51 -07:00 |
|
lucidrains
|
83cfd2cd1b
|
task conditioning when dreaming
0.0.33
|
2025-10-18 07:47:13 -07:00 |
|
lucidrains
|
22e13c45fc
|
rename
0.0.32
|
2025-10-17 14:44:25 -07:00 |
|
lucidrains
|
c967404471
|
0.0.31
0.0.31
|
2025-10-17 08:55:42 -07:00 |
|
lucidrains
|
0c1b067f97
|
if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon
|
2025-10-17 08:55:20 -07:00 |
|
lucidrains
|
cb416c0d44
|
handle the entropies during policy optimization
0.0.30
|
2025-10-17 08:47:26 -07:00 |
|
lucidrains
|
61773c8219
|
eventually we will need to learn from the outside stream of experience
0.0.29
|
2025-10-17 08:06:24 -07:00 |
|
lucidrains
|
0dba734280
|
start the learning in dreams portion
0.0.27
|
2025-10-17 08:00:47 -07:00 |
|
lucidrains
|
a0161760a0
|
extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training
0.0.26
|
2025-10-16 10:40:59 -07:00 |
|
lucidrains
|
2d20d0a6c1
|
able to roll out actions from one agent within the dreams of a world model
0.0.25
|
2025-10-16 10:15:43 -07:00 |
|
lucidrains
|
d74f09f0b3
|
a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation
0.0.24
|
2025-10-16 09:40:14 -07:00 |
|
lucidrains
|
2ccb290e26
|
pass the attend kwargs for the block causal masking in tokenizer
0.0.23
|
2025-10-16 08:33:26 -07:00 |
|
lucidrains
|
517ef6b94b
|
oops
0.0.22
|
2025-10-16 07:03:51 -07:00 |
|
lucidrains
|
ec18bc0fa4
|
cleanup
|
2025-10-16 06:44:28 -07:00 |
|
lucidrains
|
2a902eaaf7
|
allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it
0.0.21
|
2025-10-16 06:41:02 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
0.0.20
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
ff81dd761b
|
separate action and agent embeds
0.0.19
|
2025-10-13 11:36:21 -07:00 |
|
lucidrains
|
6dbdc3d7d8
|
correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values
0.0.18
|
2025-10-12 16:16:18 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
0.0.17
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
c5e64ff4ce
|
separate out the key from the value projections in attention for muon
0.0.16
|
2025-10-12 09:42:22 -07:00 |
|
lucidrains
|
ab5de6795f
|
bring in muon
|
2025-10-12 09:35:06 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
0.0.15
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
01bf70e18a
|
0.0.14
0.0.14
|
2025-10-11 09:24:58 -07:00 |
|
lucidrains
|
b2725d9b6e
|
complete behavior cloning for one agent
|
2025-10-11 09:24:49 -07:00 |
|
lucidrains
|
02558d1f08
|
will organize the unembedding parameters under the actor optimizer
|
2025-10-11 06:55:57 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
0.0.12
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
0.0.11
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
0.0.10
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
c68942b026
|
cleanup
|
2025-10-10 10:42:54 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
0.0.9
|
2025-10-10 10:41:48 -07:00 |
|
lucidrains
|
9101a49cdd
|
handle continuous value normalization if stats passed in
|
2025-10-09 08:59:54 -07:00 |
|
lucidrains
|
31f4363be7
|
must be able to do phase1 and phase2 training
|
2025-10-09 08:04:36 -07:00 |
|
lucidrains
|
e2d86a4543
|
add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)
0.0.8
|
2025-10-09 07:53:42 -07:00 |
|
lucidrains
|
b62c08be65
|
fix task embed in presence of multiple agent tokens
|
2025-10-08 08:42:25 -07:00 |
|
lucidrains
|
4c2ed100a3
|
fix masking for multiple agent tokens
0.0.7
|
2025-10-08 08:26:44 -07:00 |
|
lucidrains
|
ed0918c974
|
prepare for evolution within dreams
|
2025-10-08 08:13:16 -07:00 |
|
lucidrains
|
892654d442
|
multiple agent tokens sharing the same state
|
2025-10-08 08:06:13 -07:00 |
|
lucidrains
|
c4e0f46528
|
for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers
|
2025-10-08 07:37:34 -07:00 |
|