dreamer4

Author	SHA1	Message	Date
lucidrains	fb6d69f43a	complete the latent autoregressive prediction, to use the log variance as a state entropy bonus	2025-12-03 06:40:19 -08:00
lucidrains	2e7f406d49	allow for the combining of experiences from environment and dream	2025-11-13 16:37:35 -08:00
lucidrains	690ecf07dc	fix the rnn time caching issue	2025-11-11 17:04:02 -08:00
lucidrains	ac1c12f743	disable until rnn hiddens are handled properly	2025-11-10 15:52:43 -08:00
lucidrains	3c84b404a8	rnn layer needs to be hyper connected too	2025-11-10 15:51:33 -08:00
lucidrains	d5b70e2b86	allow for adding an RNN before time attention, but need to handle caching still	2025-11-10 11:42:20 -08:00
lucidrains	c3532fa797	add learned value residual	2025-11-10 09:33:58 -08:00
lucidrains	73029635fe	last commit for the day	2025-11-09 11:12:37 -08:00
lucidrains	e1c41f4371	decorrelation loss for spatial attention as well	2025-11-09 10:41:58 -08:00
lucidrains	051d4d6ee2	oops	2025-11-09 10:12:51 -08:00
lucidrains	ef3a5552e7	eventually video tokenizer may need to be trained on single frames	2025-11-09 10:11:56 -08:00
lucidrains	0c4224da18	add a decorrelation loss for temporal attention in encoder of video tokenizer	2025-11-09 09:47:47 -08:00
lucidrains	cfd34f1eba	able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it	2025-11-09 16:16:13 +00:00
lucidrains	24ef72d528	0.1.4	2025-11-04 15:21:20 -08:00
lucidrains	d756d1bb8c	addressing issues raised by an independent researcher with llm assistance	2025-10-31 08:37:39 -07:00
lucidrains	60681fce1d	fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme	2025-10-31 06:48:49 -07:00
lucidrains	3beae186da	some more control over whether to normalize advantages	2025-10-30 08:46:03 -07:00
lucidrains	0904e224ab	make the reverse kl optional	2025-10-30 08:22:50 -07:00
lucidrains	767789d0ca	they decided on 0.3 for the behavioral prior loss weight	2025-10-29 13:24:58 -07:00
lucidrains	35b87c4fa1	oops	2025-10-29 13:04:02 -07:00
lucidrains	c4a3cb09d5	swap for discrete kl div, thanks to Dirk for pointing this out on the discord	2025-10-29 11:54:18 -07:00
lucidrains	cb54121ace	sim trainer needs to take care of agent embedding and old actions	2025-10-29 11:15:11 -07:00
lucidrains	586379f2c8	sum the kl div loss across number of actions by default for action embedder .kl_div	2025-10-29 10:46:42 -07:00
lucidrains	a358a44a53	always store old agent embeds and old action parameters when possible	2025-10-29 10:39:15 -07:00
lucidrains	3547344312	take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience	2025-10-29 10:31:32 -07:00
lucidrains	691d9ca007	add kl div on action embedder, working way towards the kl div loss in pmpo	2025-10-29 10:02:25 -07:00
lucidrains	91d697f8ca	fix pmpo	2025-10-28 18:55:22 -07:00
lucidrains	7acaa764f6	evolutionary policy optimization on dreams will be interesting	2025-10-28 10:17:01 -07:00
lucidrains	c0450359f3	allow for evolutionary policy optimization	2025-10-28 10:11:13 -07:00
lucidrains	46f86cd247	fix storing of agent embedding	2025-10-28 09:36:58 -07:00
lucidrains	903c43b770	use the agent embeds off the stored experience if available	2025-10-28 09:14:02 -07:00
lucidrains	d476fa7b14	able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads)	2025-10-28 09:02:26 -07:00
lucidrains	789f091c63	redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic	2025-10-28 08:04:48 -07:00
lucidrains	995b1f64e5	handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env	2025-10-27 10:14:28 -07:00
lucidrains	fe79bfa951	optionally keep track of returns statistics and normalize with them before advantage	2025-10-27 09:02:08 -07:00
lucidrains	f808b1c1d2	oops	2025-10-27 08:34:22 -07:00
lucidrains	349a03acd7	redo so `lens` is always the episode length, including the bootstrap value timestep, and use `is_truncated` to mask out the bootstrap node from being learned on	2025-10-27 08:06:21 -07:00
lucidrains	59c458aea3	introduce an `is_truncated` field on Experience, and mask out rewards and values before calculating gae appropriately	2025-10-27 07:55:00 -07:00
lucidrains	fbfd59e42f	handle variable lengthed experiences when doing policy optimization	2025-10-27 06:09:09 -07:00
lucidrains	46432aee9b	fix an issue with bc	2025-10-25 12:30:08 -07:00
lucidrains	f97d9adc97	oops, forgot to add the view embedding for robotics	2025-10-25 11:39:06 -07:00
lucidrains	32cf142b4d	take another step for variable len experiences	2025-10-25 11:31:41 -07:00
lucidrains	4d8f5613cc	start storing the experience lens	2025-10-25 10:55:47 -07:00
lucidrains	3d5617d769	take a step towards variable lengthed experiences during training	2025-10-25 10:45:34 -07:00
lucidrains	4ce82f34df	given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints	2025-10-25 09:20:55 -07:00
lucidrains	a9b728c611	incorporate proprioception into the dynamics world model	2025-10-24 11:24:22 -07:00
lucidrains	35c1db4c7d	sketch of training from sim env	2025-10-24 09:13:09 -07:00
lucidrains	27ac05efb0	function for combining experiences	2025-10-24 08:00:10 -07:00
lucidrains	d0ffc6bfed	with or without signed advantage	2025-10-23 16:24:29 -07:00
lucidrains	fb3e026fe0	handle vectorized env	2025-10-22 11:19:44 -07:00

1 2 3

117 Commits