Merge pull request #5 from Cycl0/patch-1
Phil Wang
2025-11-09 08:17:41 -08:00
cfd34f1ebaable to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it
0.1.5
lucidrains
2025-11-09 16:16:13 +00:00
Merge pull request #4 from dirkmcpherson/bugfix
Phil Wang
2025-11-04 15:19:25 -08:00
b0f6b8583dfix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug.
j
2025-11-04 17:29:12 -05:00
c0a6cd56a1link to new discord
lucidrains
2025-10-31 09:06:44 -07:00
d756d1bb8caddressing issues raised by an independent researcher with llm assistance
0.1.2
lucidrains
2025-10-31 08:37:39 -07:00
ef367969f8addressing issues raised by an independent researcher with llm assistance
0.1.1
lucidrains
2025-10-31 08:26:33 -07:00
60681fce1dfix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme
lucidrains
2025-10-31 06:48:49 -07:00
a0bda62989fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme
0.1.0
lucidrains
2025-10-31 06:47:35 -07:00
c4a3cb09d5swap for discrete kl div, thanks to Dirk for pointing this out on the discord
0.0.98
lucidrains
2025-10-29 11:54:18 -07:00
4b06615018swap src and tgt for kl div
0.0.97
lucidrains
2025-10-29 11:23:34 -07:00
cb54121acesim trainer needs to take care of agent embedding and old actions
0.0.96
lucidrains
2025-10-29 11:15:11 -07:00
586379f2c8sum the kl div loss across number of actions by default for action embedder .kl_div
0.0.95
lucidrains
2025-10-29 10:46:42 -07:00
a358a44a53always store old agent embeds and old action parameters when possible
0.0.94
lucidrains
2025-10-29 10:39:15 -07:00
3547344312take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience
0.0.93
lucidrains
2025-10-29 10:31:32 -07:00
691d9ca007add kl div on action embedder, working way towards the kl div loss in pmpo
0.0.92
lucidrains
2025-10-29 10:02:25 -07:00
7acaa764f6evolutionary policy optimization on dreams will be interesting
0.0.90
lucidrains
2025-10-28 10:17:01 -07:00
c0450359f3allow for evolutionary policy optimization
0.0.89
lucidrains
2025-10-28 10:11:13 -07:00
46f86cd247fix storing of agent embedding
0.0.88
lucidrains
2025-10-28 09:36:58 -07:00
903c43b770use the agent embeds off the stored experience if available
0.0.87
lucidrains
2025-10-28 09:14:02 -07:00
d476fa7b14able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)
0.0.85
lucidrains
2025-10-28 09:02:26 -07:00
b02abc7a8aable to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)
0.0.84
lucidrains
2025-10-28 09:01:29 -07:00
789f091c63redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic
0.0.83
lucidrains
2025-10-28 08:04:48 -07:00
995b1f64e5handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env
0.0.82
lucidrains
2025-10-27 10:14:28 -07:00
349a03acd7redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on
0.0.79
lucidrains
2025-10-27 08:06:21 -07:00
59c458aea3introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately
0.0.78
lucidrains
2025-10-27 07:55:00 -07:00
4d8f5613ccstart storing the experience lens
0.0.73
lucidrains
2025-10-25 10:55:47 -07:00
3d5617d769take a step towards variable lengthed experiences during training
0.0.72
lucidrains
2025-10-25 10:45:34 -07:00
77a40e8701validate that we can generate multiple video streams for robotics use-case
lucidrains
2025-10-25 09:23:07 -07:00
4ce82f34dfgiven the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints
0.0.71
lucidrains
2025-10-25 09:20:55 -07:00
a9b728c611incorporate proprioception into the dynamics world model
0.0.70
lucidrains
2025-10-24 11:24:22 -07:00
35c1db4c7dsketch of training from sim env
0.0.69
lucidrains
2025-10-24 09:13:09 -07:00
8526347316sketch of training from sim env
0.0.68
lucidrains
2025-10-24 08:56:51 -07:00
27ac05efb0function for combining experiences
0.0.67
lucidrains
2025-10-24 08:00:10 -07:00
d0ffc6bfedwith or without signed advantage
0.0.66
lucidrains
2025-10-23 16:24:29 -07:00
2fc3b17149take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards
0.0.57
lucidrains
2025-10-21 10:20:08 -07:00