lucidrains
|
7ba3988fb9
|
prepare a mock for interacting with online env
|
2025-10-21 09:03:20 -07:00 |
|
lucidrains
|
ea13d4fcab
|
take a gradient step with video tokenizer trainer
0.0.54
|
2025-10-21 08:52:22 -07:00 |
|
lucidrains
|
15876d34cf
|
more muon prep
0.0.53
|
2025-10-21 08:23:59 -07:00 |
|
lucidrains
|
b4763caff9
|
fix rotary embeddings in presence of kv caching
|
2025-10-21 07:10:21 -07:00 |
|
lucidrains
|
7195bbb196
|
oops
0.0.50
|
2025-10-20 12:42:27 -07:00 |
|
lucidrains
|
ca244a290c
|
first pass through the kv cache for the time block in the dynamics model
0.0.49
|
2025-10-20 12:25:50 -07:00 |
|
lucidrains
|
a7e0c395c3
|
allow for only rmsnorm for keys in attention
0.0.48
|
2025-10-20 11:20:49 -07:00 |
|
lucidrains
|
1345326656
|
another measure for the attending to nothing issue
0.0.47
|
2025-10-20 10:32:31 -07:00 |
|
lucidrains
|
55574c054e
|
assert
0.0.46
|
2025-10-19 09:59:42 -07:00 |
|
lucidrains
|
27ed6d0ba5
|
fix time kv cache
0.0.45
|
2025-10-19 09:16:06 -07:00 |
|
lucidrains
|
4930002e99
|
bit of progress on time kv cache
0.0.44
|
2025-10-19 09:04:26 -07:00 |
|
lucidrains
|
ecbe13efe8
|
allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction)
0.0.43
|
2025-10-19 08:37:56 -07:00 |
|
lucidrains
|
f651d779e3
|
able to control the update of the loss ema from dynamics model forward
0.0.42
|
2025-10-19 08:25:50 -07:00 |
|
lucidrains
|
374667d8a9
|
take care of the loss normalization mentioned at the end of the first paragraph of section 3
0.0.41
|
2025-10-19 08:24:41 -07:00 |
|
lucidrains
|
79a1b1c46e
|
oops
0.0.40
|
2025-10-18 10:31:48 -07:00 |
|
lucidrains
|
b6aa19f31e
|
complete multi-token prediction for actions, tackle loss balancing another day
0.0.38
|
2025-10-18 10:23:14 -07:00 |
|
lucidrains
|
bc629d78b1
|
inverse norm for continuous actions when sampling
0.0.37
|
2025-10-18 08:55:04 -07:00 |
|
lucidrains
|
0ee475d2df
|
oops
0.0.36
|
2025-10-18 08:50:53 -07:00 |
|
lucidrains
|
8c88a33d3b
|
complete multi token prediction for the reward head
0.0.35
|
2025-10-18 08:33:06 -07:00 |
|
lucidrains
|
911a1a8434
|
oops
0.0.34
|
2025-10-18 08:07:06 -07:00 |
|
lucidrains
|
5fc0022bbf
|
the function for generating the MTP targets, as well as the mask for the losses
|
2025-10-18 08:04:51 -07:00 |
|
lucidrains
|
83cfd2cd1b
|
task conditioning when dreaming
0.0.33
|
2025-10-18 07:47:13 -07:00 |
|
lucidrains
|
22e13c45fc
|
rename
0.0.32
|
2025-10-17 14:44:25 -07:00 |
|
lucidrains
|
c967404471
|
0.0.31
0.0.31
|
2025-10-17 08:55:42 -07:00 |
|
lucidrains
|
0c1b067f97
|
if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon
|
2025-10-17 08:55:20 -07:00 |
|
lucidrains
|
cb416c0d44
|
handle the entropies during policy optimization
0.0.30
|
2025-10-17 08:47:26 -07:00 |
|
lucidrains
|
61773c8219
|
eventually we will need to learn from the outside stream of experience
0.0.29
|
2025-10-17 08:06:24 -07:00 |
|
lucidrains
|
0dba734280
|
start the learning in dreams portion
0.0.27
|
2025-10-17 08:00:47 -07:00 |
|
lucidrains
|
a0161760a0
|
extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training
0.0.26
|
2025-10-16 10:40:59 -07:00 |
|
lucidrains
|
2d20d0a6c1
|
able to roll out actions from one agent within the dreams of a world model
0.0.25
|
2025-10-16 10:15:43 -07:00 |
|
lucidrains
|
d74f09f0b3
|
a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation
0.0.24
|
2025-10-16 09:40:14 -07:00 |
|
lucidrains
|
2ccb290e26
|
pass the attend kwargs for the block causal masking in tokenizer
0.0.23
|
2025-10-16 08:33:26 -07:00 |
|
lucidrains
|
517ef6b94b
|
oops
0.0.22
|
2025-10-16 07:03:51 -07:00 |
|
lucidrains
|
ec18bc0fa4
|
cleanup
|
2025-10-16 06:44:28 -07:00 |
|
lucidrains
|
2a902eaaf7
|
allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it
0.0.21
|
2025-10-16 06:41:02 -07:00 |
|
lucidrains
|
d28251e9f9
|
another consideration before knocking out the RL logic
0.0.20
|
2025-10-14 11:10:26 -07:00 |
|
lucidrains
|
ff81dd761b
|
separate action and agent embeds
0.0.19
|
2025-10-13 11:36:21 -07:00 |
|
lucidrains
|
6dbdc3d7d8
|
correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values
0.0.18
|
2025-10-12 16:16:18 -07:00 |
|
lucidrains
|
9c78962736
|
sampling actions
0.0.17
|
2025-10-12 11:27:12 -07:00 |
|
lucidrains
|
c5e64ff4ce
|
separate out the key from the value projections in attention for muon
0.0.16
|
2025-10-12 09:42:22 -07:00 |
|
lucidrains
|
ab5de6795f
|
bring in muon
|
2025-10-12 09:35:06 -07:00 |
|
lucidrains
|
8a73a27fc7
|
add nested tensor way for getting log prob of multiple discrete actions
0.0.15
|
2025-10-11 10:53:24 -07:00 |
|
lucidrains
|
01bf70e18a
|
0.0.14
0.0.14
|
2025-10-11 09:24:58 -07:00 |
|
lucidrains
|
b2725d9b6e
|
complete behavior cloning for one agent
|
2025-10-11 09:24:49 -07:00 |
|
lucidrains
|
02558d1f08
|
will organize the unembedding parameters under the actor optimizer
|
2025-10-11 06:55:57 -07:00 |
|
lucidrains
|
563b269f8a
|
bring in hyper connections
0.0.12
|
2025-10-11 06:52:57 -07:00 |
|
lucidrains
|
5df3e69583
|
last commit for the day
0.0.11
|
2025-10-10 11:59:18 -07:00 |
|
lucidrains
|
9230267d34
|
handle subset of discrete action unembedding
0.0.10
|
2025-10-10 11:27:05 -07:00 |
|
lucidrains
|
c68942b026
|
cleanup
|
2025-10-10 10:42:54 -07:00 |
|
lucidrains
|
32aa355e37
|
prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL
0.0.9
|
2025-10-10 10:41:48 -07:00 |
|