dreamer4

Author	SHA1	Message	Date
lucidrains	cb54121ace	sim trainer needs to take care of agent embedding and old actions 0.0.96	2025-10-29 11:15:11 -07:00
lucidrains	586379f2c8	sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95	2025-10-29 10:46:42 -07:00
lucidrains	a358a44a53	always store old agent embeds and old action parameters when possible 0.0.94	2025-10-29 10:39:15 -07:00
lucidrains	3547344312	take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93	2025-10-29 10:31:32 -07:00
lucidrains	691d9ca007	add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92	2025-10-29 10:02:25 -07:00
lucidrains	91d697f8ca	fix pmpo 0.0.91	2025-10-28 18:55:22 -07:00
lucidrains	7acaa764f6	evolutionary policy optimization on dreams will be interesting 0.0.90	2025-10-28 10:17:01 -07:00
lucidrains	c0450359f3	allow for evolutionary policy optimization 0.0.89	2025-10-28 10:11:13 -07:00
lucidrains	46f86cd247	fix storing of agent embedding 0.0.88	2025-10-28 09:36:58 -07:00
lucidrains	903c43b770	use the agent embeds off the stored experience if available 0.0.87	2025-10-28 09:14:02 -07:00
lucidrains	d476fa7b14	able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85	2025-10-28 09:02:26 -07:00
lucidrains	789f091c63	redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83	2025-10-28 08:04:48 -07:00
lucidrains	41ab83f691	fix mock	2025-10-27 10:47:24 -07:00
lucidrains	995b1f64e5	handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82	2025-10-27 10:14:28 -07:00
lucidrains	fd1e87983b	quantile filter	2025-10-27 09:08:26 -07:00
lucidrains	fe79bfa951	optionally keep track of returns statistics and normalize with them before advantage 0.0.81	2025-10-27 09:02:08 -07:00
lucidrains	f808b1c1d2	oops 0.0.80	2025-10-27 08:34:22 -07:00
lucidrains	349a03acd7	redo so `lens` is always the episode length, including the bootstrap value timestep, and use `is_truncated` to mask out the bootstrap node from being learned on 0.0.79	2025-10-27 08:06:21 -07:00
lucidrains	59c458aea3	introduce an `is_truncated` field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78	2025-10-27 07:55:00 -07:00
lucidrains	fbfd59e42f	handle variable lengthed experiences when doing policy optimization 0.0.77	2025-10-27 06:09:09 -07:00
lucidrains	46432aee9b	fix an issue with bc	2025-10-25 12:30:08 -07:00
lucidrains	f97d9adc97	oops, forgot to add the view embedding for robotics 0.0.75	2025-10-25 11:39:06 -07:00
lucidrains	32cf142b4d	take another step for variable len experiences 0.0.74	2025-10-25 11:31:41 -07:00
lucidrains	1ed6a15cb0	fix tests	2025-10-25 11:13:22 -07:00
lucidrains	4d8f5613cc	start storing the experience lens 0.0.73	2025-10-25 10:55:47 -07:00
lucidrains	3d5617d769	take a step towards variable lengthed experiences during training 0.0.72	2025-10-25 10:45:34 -07:00
lucidrains	77a40e8701	validate that we can generate multiple video streams for robotics use-case	2025-10-25 09:23:07 -07:00
lucidrains	4ce82f34df	given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71	2025-10-25 09:20:55 -07:00
lucidrains	a9b728c611	incorporate proprioception into the dynamics world model 0.0.70	2025-10-24 11:24:22 -07:00
lucidrains	35c1db4c7d	sketch of training from sim env 0.0.69	2025-10-24 09:13:09 -07:00
lucidrains	27ac05efb0	function for combining experiences 0.0.67	2025-10-24 08:00:10 -07:00
lucidrains	d0ffc6bfed	with or without signed advantage 0.0.66	2025-10-23 16:24:29 -07:00
lucidrains	fb3e026fe0	handle vectorized env 0.0.65	2025-10-22 11:19:44 -07:00
lucidrains	7ecc5d03e8	wire up the time kv cache when interacting with sim / env 0.0.62	2025-10-22 08:39:11 -07:00
lucidrains	d82debb7a6	first pass through gathering experience with a mock env for online rl 0.0.61	2025-10-22 08:32:46 -07:00
lucidrains	03b16a48f2	sketch out the dream trainer, seems like they only fine tune the heads 0.0.60	2025-10-22 06:41:10 -07:00
lucidrains	6f1a7a24ed	try to fix ci	2025-10-21 11:47:39 -07:00
lucidrains	e316499047	naming	2025-10-21 10:57:55 -07:00
lucidrains	40da985c6b	tweak bc trainer 0.0.59	2025-10-21 10:55:24 -07:00
lucidrains	2fc3b17149	take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57	2025-10-21 10:20:08 -07:00
lucidrains	283d59d75a	oops	2025-10-21 09:50:07 -07:00
lucidrains	4a5465eeb6	fix ci	2025-10-21 09:17:53 -07:00
lucidrains	b34128d3d0	make sure time kv cache can be passed back in during generation 0.0.55	2025-10-21 09:15:32 -07:00
lucidrains	7ba3988fb9	prepare a mock for interacting with online env	2025-10-21 09:03:20 -07:00
lucidrains	ea13d4fcab	take a gradient step with video tokenizer trainer 0.0.54	2025-10-21 08:52:22 -07:00
lucidrains	15876d34cf	more muon prep 0.0.53	2025-10-21 08:23:59 -07:00
lucidrains	b4763caff9	fix rotary embeddings in presence of kv caching	2025-10-21 07:10:21 -07:00
lucidrains	7195bbb196	oops 0.0.50	2025-10-20 12:42:27 -07:00
lucidrains	ca244a290c	first pass through the kv cache for the time block in the dynamics model 0.0.49	2025-10-20 12:25:50 -07:00
lucidrains	a7e0c395c3	allow for only rmsnorm for keys in attention 0.0.48	2025-10-20 11:20:49 -07:00

1 2 3 4

170 Commits