dreamer4

Author	SHA1	Message	Date
lucidrains	22e13c45fc	rename	2025-10-17 14:44:25 -07:00
lucidrains	c967404471	0.0.31	2025-10-17 08:55:42 -07:00
lucidrains	cb416c0d44	handle the entropies during policy optimization	2025-10-17 08:47:26 -07:00
lucidrains	61773c8219	eventually we will need to learn from the outside stream of experience	2025-10-17 08:06:24 -07:00
lucidrains	0dba734280	start the learning in dreams portion	2025-10-17 08:00:47 -07:00
lucidrains	a0161760a0	extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training	2025-10-16 10:40:59 -07:00
lucidrains	2d20d0a6c1	able to roll out actions from one agent within the dreams of a world model	2025-10-16 10:15:43 -07:00
lucidrains	d74f09f0b3	a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation	2025-10-16 09:40:14 -07:00
lucidrains	2ccb290e26	pass the attend kwargs for the block causal masking in tokenizer	2025-10-16 08:33:26 -07:00
lucidrains	517ef6b94b	oops	2025-10-16 07:03:51 -07:00
lucidrains	2a902eaaf7	allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it	2025-10-16 06:41:02 -07:00
lucidrains	d28251e9f9	another consideration before knocking out the RL logic	2025-10-14 11:10:26 -07:00
lucidrains	ff81dd761b	separate action and agent embeds	2025-10-13 11:36:21 -07:00
lucidrains	6dbdc3d7d8	correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values	2025-10-12 16:16:18 -07:00
lucidrains	9c78962736	sampling actions	2025-10-12 11:27:12 -07:00
lucidrains	c5e64ff4ce	separate out the key from the value projections in attention for muon	2025-10-12 09:42:22 -07:00
lucidrains	ab5de6795f	bring in muon	2025-10-12 09:35:06 -07:00
lucidrains	8a73a27fc7	add nested tensor way for getting log prob of multiple discrete actions	2025-10-11 10:53:24 -07:00
lucidrains	01bf70e18a	0.0.14	2025-10-11 09:24:58 -07:00
lucidrains	563b269f8a	bring in hyper connections	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding	2025-10-10 11:27:05 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL	2025-10-10 10:41:48 -07:00
lucidrains	e2d86a4543	add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)	2025-10-09 07:53:42 -07:00
lucidrains	4c2ed100a3	fix masking for multiple agent tokens	2025-10-08 08:26:44 -07:00
lucidrains	63b63dfedd	add shard	2025-10-08 06:56:03 -07:00
lucidrains	187edc1414	all set for generating the perceived rewards once the RL components fall into place	2025-10-08 06:33:28 -07:00
lucidrains	c056835aea	address https://github.com/lucidrains/dreamer4/issues/2	2025-10-08 05:55:22 -07:00
lucidrains	0fdb67bafa	add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work	2025-10-07 09:37:37 -07:00
lucidrains	36ccb08500	allow for step_sizes to be passed in, log2 is not that intuitive	2025-10-07 08:36:46 -07:00
lucidrains	1176269927	correct signal levels when doing teacher forcing generation	2025-10-07 07:41:02 -07:00
lucidrains	0f4783f23c	use a newly built module from x-mlps for multi token prediction	2025-10-04 07:56:56 -07:00
lucidrains	0a26e0f92f	complete the lpips loss used for the video tokenizer	2025-10-04 07:47:27 -07:00
lucidrains	986bf4c529	allow for the video tokenizer to accept any spatial dimensions by parameterizing the decoder positional embedding with an MLP	2025-10-03 10:08:05 -07:00
lucidrains	046f8927d1	complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss	2025-10-03 08:08:44 -07:00
lucidrains	8b66b703e0	add the discretized signal level + step size embeddings necessary for diffusion forcing + shortcut	2025-10-02 07:39:34 -07:00
lucidrains	e3cbcd94c6	sketch out top down	2025-10-01 10:25:56 -07:00
lucidrains	2e92c0121a	they employ two stability measures, qk rmsnorm and softclamping of attention logits	2025-10-01 09:40:24 -07:00
lucidrains	bdc7dd30a6	scaffold	2025-10-01 07:18:23 -07:00

39 Commits