lucidrains
|
cb54121ace
|
sim trainer needs to take care of agent embedding and old actions
0.0.96
|
2025-10-29 11:15:11 -07:00 |
|
lucidrains
|
586379f2c8
|
sum the kl div loss across number of actions by default for action embedder .kl_div
0.0.95
|
2025-10-29 10:46:42 -07:00 |
|
lucidrains
|
a358a44a53
|
always store old agent embeds and old action parameters when possible
0.0.94
|
2025-10-29 10:39:15 -07:00 |
|
lucidrains
|
3547344312
|
take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience
0.0.93
|
2025-10-29 10:31:32 -07:00 |
|
lucidrains
|
691d9ca007
|
add kl div on action embedder, working way towards the kl div loss in pmpo
0.0.92
|
2025-10-29 10:02:25 -07:00 |
|
lucidrains
|
91d697f8ca
|
fix pmpo
0.0.91
|
2025-10-28 18:55:22 -07:00 |
|
lucidrains
|
7acaa764f6
|
evolutionary policy optimization on dreams will be interesting
0.0.90
|
2025-10-28 10:17:01 -07:00 |
|
lucidrains
|
c0450359f3
|
allow for evolutionary policy optimization
0.0.89
|
2025-10-28 10:11:13 -07:00 |
|
lucidrains
|
46f86cd247
|
fix storing of agent embedding
0.0.88
|
2025-10-28 09:36:58 -07:00 |
|
lucidrains
|
903c43b770
|
use the agent embeds off the stored experience if available
0.0.87
|
2025-10-28 09:14:02 -07:00 |
|
lucidrains
|
d476fa7b14
|
able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)
0.0.85
|
2025-10-28 09:02:26 -07:00 |
|
lucidrains
|
789f091c63
|
redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic
0.0.83
|
2025-10-28 08:04:48 -07:00 |
|
lucidrains
|
41ab83f691
|
fix mock
|
2025-10-27 10:47:24 -07:00 |
|
lucidrains
|
995b1f64e5
|
handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env
0.0.82
|
2025-10-27 10:14:28 -07:00 |
|
lucidrains
|
fd1e87983b
|
quantile filter
|
2025-10-27 09:08:26 -07:00 |
|
lucidrains
|
fe79bfa951
|
optionally keep track of returns statistics and normalize with them before advantage
0.0.81
|
2025-10-27 09:02:08 -07:00 |
|
lucidrains
|
f808b1c1d2
|
oops
0.0.80
|
2025-10-27 08:34:22 -07:00 |
|
lucidrains
|
349a03acd7
|
redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on
0.0.79
|
2025-10-27 08:06:21 -07:00 |
|
lucidrains
|
59c458aea3
|
introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately
0.0.78
|
2025-10-27 07:55:00 -07:00 |
|
lucidrains
|
fbfd59e42f
|
handle variable lengthed experiences when doing policy optimization
0.0.77
|
2025-10-27 06:09:09 -07:00 |
|
lucidrains
|
46432aee9b
|
fix an issue with bc
|
2025-10-25 12:30:08 -07:00 |
|
lucidrains
|
f97d9adc97
|
oops, forgot to add the view embedding for robotics
0.0.75
|
2025-10-25 11:39:06 -07:00 |
|
lucidrains
|
32cf142b4d
|
take another step for variable len experiences
0.0.74
|
2025-10-25 11:31:41 -07:00 |
|
lucidrains
|
1ed6a15cb0
|
fix tests
|
2025-10-25 11:13:22 -07:00 |
|
lucidrains
|
4d8f5613cc
|
start storing the experience lens
0.0.73
|
2025-10-25 10:55:47 -07:00 |
|
lucidrains
|
3d5617d769
|
take a step towards variable lengthed experiences during training
0.0.72
|
2025-10-25 10:45:34 -07:00 |
|
lucidrains
|
77a40e8701
|
validate that we can generate multiple video streams for robotics use-case
|
2025-10-25 09:23:07 -07:00 |
|
lucidrains
|
4ce82f34df
|
given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints
0.0.71
|
2025-10-25 09:20:55 -07:00 |
|
lucidrains
|
a9b728c611
|
incorporate proprioception into the dynamics world model
0.0.70
|
2025-10-24 11:24:22 -07:00 |
|
lucidrains
|
35c1db4c7d
|
sketch of training from sim env
0.0.69
|
2025-10-24 09:13:09 -07:00 |
|
lucidrains
|
27ac05efb0
|
function for combining experiences
0.0.67
|
2025-10-24 08:00:10 -07:00 |
|
lucidrains
|
d0ffc6bfed
|
with or without signed advantage
0.0.66
|
2025-10-23 16:24:29 -07:00 |
|
lucidrains
|
fb3e026fe0
|
handle vectorized env
0.0.65
|
2025-10-22 11:19:44 -07:00 |
|
lucidrains
|
7ecc5d03e8
|
wire up the time kv cache when interacting with sim / env
0.0.62
|
2025-10-22 08:39:11 -07:00 |
|
lucidrains
|
d82debb7a6
|
first pass through gathering experience with a mock env for online rl
0.0.61
|
2025-10-22 08:32:46 -07:00 |
|
lucidrains
|
03b16a48f2
|
sketch out the dream trainer, seems like they only fine tune the heads
0.0.60
|
2025-10-22 06:41:10 -07:00 |
|
lucidrains
|
6f1a7a24ed
|
try to fix ci
|
2025-10-21 11:47:39 -07:00 |
|
lucidrains
|
e316499047
|
naming
|
2025-10-21 10:57:55 -07:00 |
|
lucidrains
|
40da985c6b
|
tweak bc trainer
0.0.59
|
2025-10-21 10:55:24 -07:00 |
|
lucidrains
|
2fc3b17149
|
take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards
0.0.57
|
2025-10-21 10:20:08 -07:00 |
|
lucidrains
|
283d59d75a
|
oops
|
2025-10-21 09:50:07 -07:00 |
|
lucidrains
|
4a5465eeb6
|
fix ci
|
2025-10-21 09:17:53 -07:00 |
|
lucidrains
|
b34128d3d0
|
make sure time kv cache can be passed back in during generation
0.0.55
|
2025-10-21 09:15:32 -07:00 |
|
lucidrains
|
7ba3988fb9
|
prepare a mock for interacting with online env
|
2025-10-21 09:03:20 -07:00 |
|
lucidrains
|
ea13d4fcab
|
take a gradient step with video tokenizer trainer
0.0.54
|
2025-10-21 08:52:22 -07:00 |
|
lucidrains
|
15876d34cf
|
more muon prep
0.0.53
|
2025-10-21 08:23:59 -07:00 |
|
lucidrains
|
b4763caff9
|
fix rotary embeddings in presence of kv caching
|
2025-10-21 07:10:21 -07:00 |
|
lucidrains
|
7195bbb196
|
oops
0.0.50
|
2025-10-20 12:42:27 -07:00 |
|
lucidrains
|
ca244a290c
|
first pass through the kv cache for the time block in the dynamics model
0.0.49
|
2025-10-20 12:25:50 -07:00 |
|
lucidrains
|
a7e0c395c3
|
allow for only rmsnorm for keys in attention
0.0.48
|
2025-10-20 11:20:49 -07:00 |
|