dreamer4

Author	SHA1	Message	Date
j	b0f6b8583d	fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug.	2025-11-04 17:29:12 -05:00
lucidrains	c0a6cd56a1	link to new discord	2025-10-31 09:06:44 -07:00
lucidrains	d756d1bb8c	addressing issues raised by an independent researcher with llm assistance 0.1.2	2025-10-31 08:37:39 -07:00
lucidrains	60681fce1d	fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme	2025-10-31 06:48:49 -07:00
Phil Wang	6870294d95	no longer needed	2025-10-30 09:23:27 -07:00
lucidrains	3beae186da	some more control over whether to normalize advantages 0.0.102	2025-10-30 08:46:03 -07:00
lucidrains	0904e224ab	make the reverse kl optional 0.0.101	2025-10-30 08:22:50 -07:00
lucidrains	767789d0ca	they decided on 0.3 for the behavioral prior loss weight 0.0.100	2025-10-29 13:24:58 -07:00
lucidrains	35b87c4fa1	oops 0.0.99	2025-10-29 13:04:02 -07:00
lucidrains	c4a3cb09d5	swap for discrete kl div, thanks to Dirk for pointing this out on the discord 0.0.98	2025-10-29 11:54:18 -07:00
lucidrains	cb54121ace	sim trainer needs to take care of agent embedding and old actions 0.0.96	2025-10-29 11:15:11 -07:00
lucidrains	586379f2c8	sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95	2025-10-29 10:46:42 -07:00
lucidrains	a358a44a53	always store old agent embeds and old action parameters when possible 0.0.94	2025-10-29 10:39:15 -07:00
lucidrains	3547344312	take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93	2025-10-29 10:31:32 -07:00
lucidrains	691d9ca007	add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92	2025-10-29 10:02:25 -07:00
lucidrains	91d697f8ca	fix pmpo 0.0.91	2025-10-28 18:55:22 -07:00
lucidrains	7acaa764f6	evolutionary policy optimization on dreams will be interesting 0.0.90	2025-10-28 10:17:01 -07:00
lucidrains	c0450359f3	allow for evolutionary policy optimization 0.0.89	2025-10-28 10:11:13 -07:00
lucidrains	46f86cd247	fix storing of agent embedding 0.0.88	2025-10-28 09:36:58 -07:00
lucidrains	903c43b770	use the agent embeds off the stored experience if available 0.0.87	2025-10-28 09:14:02 -07:00
lucidrains	d476fa7b14	able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85	2025-10-28 09:02:26 -07:00
lucidrains	789f091c63	redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83	2025-10-28 08:04:48 -07:00
lucidrains	41ab83f691	fix mock	2025-10-27 10:47:24 -07:00
lucidrains	995b1f64e5	handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82	2025-10-27 10:14:28 -07:00
lucidrains	fd1e87983b	quantile filter	2025-10-27 09:08:26 -07:00
lucidrains	fe79bfa951	optionally keep track of returns statistics and normalize with them before advantage 0.0.81	2025-10-27 09:02:08 -07:00
lucidrains	f808b1c1d2	oops 0.0.80	2025-10-27 08:34:22 -07:00
lucidrains	349a03acd7	redo so `lens` is always the episode length, including the bootstrap value timestep, and use `is_truncated` to mask out the bootstrap node from being learned on 0.0.79	2025-10-27 08:06:21 -07:00
lucidrains	59c458aea3	introduce an `is_truncated` field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78	2025-10-27 07:55:00 -07:00
lucidrains	fbfd59e42f	handle variable lengthed experiences when doing policy optimization 0.0.77	2025-10-27 06:09:09 -07:00
lucidrains	46432aee9b	fix an issue with bc	2025-10-25 12:30:08 -07:00
lucidrains	f97d9adc97	oops, forgot to add the view embedding for robotics 0.0.75	2025-10-25 11:39:06 -07:00
lucidrains	32cf142b4d	take another step for variable len experiences 0.0.74	2025-10-25 11:31:41 -07:00
lucidrains	1ed6a15cb0	fix tests	2025-10-25 11:13:22 -07:00
lucidrains	4d8f5613cc	start storing the experience lens 0.0.73	2025-10-25 10:55:47 -07:00
lucidrains	3d5617d769	take a step towards variable lengthed experiences during training 0.0.72	2025-10-25 10:45:34 -07:00
lucidrains	77a40e8701	validate that we can generate multiple video streams for robotics use-case	2025-10-25 09:23:07 -07:00
lucidrains	4ce82f34df	given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71	2025-10-25 09:20:55 -07:00
lucidrains	a9b728c611	incorporate proprioception into the dynamics world model 0.0.70	2025-10-24 11:24:22 -07:00
lucidrains	35c1db4c7d	sketch of training from sim env 0.0.69	2025-10-24 09:13:09 -07:00
lucidrains	27ac05efb0	function for combining experiences 0.0.67	2025-10-24 08:00:10 -07:00
lucidrains	d0ffc6bfed	with or without signed advantage 0.0.66	2025-10-23 16:24:29 -07:00
lucidrains	fb3e026fe0	handle vectorized env 0.0.65	2025-10-22 11:19:44 -07:00
lucidrains	7ecc5d03e8	wire up the time kv cache when interacting with sim / env 0.0.62	2025-10-22 08:39:11 -07:00
lucidrains	d82debb7a6	first pass through gathering experience with a mock env for online rl 0.0.61	2025-10-22 08:32:46 -07:00
lucidrains	03b16a48f2	sketch out the dream trainer, seems like they only fine tune the heads 0.0.60	2025-10-22 06:41:10 -07:00
lucidrains	6f1a7a24ed	try to fix ci	2025-10-21 11:47:39 -07:00
lucidrains	e316499047	naming	2025-10-21 10:57:55 -07:00
lucidrains	40da985c6b	tweak bc trainer 0.0.59	2025-10-21 10:55:24 -07:00
lucidrains	2fc3b17149	take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57	2025-10-21 10:20:08 -07:00

1 2 3 4

180 Commits