dreamer4

Author	SHA1	Message	Date
lucidrains	563b269f8a	bring in hyper connections 0.0.12	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day 0.0.11	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding 0.0.10	2025-10-10 11:27:05 -07:00
lucidrains	c68942b026	cleanup	2025-10-10 10:42:54 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 0.0.9	2025-10-10 10:41:48 -07:00
lucidrains	9101a49cdd	handle continuous value normalization if stats passed in	2025-10-09 08:59:54 -07:00
lucidrains	31f4363be7	must be able to do phase1 and phase2 training	2025-10-09 08:04:36 -07:00
lucidrains	e2d86a4543	add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 0.0.8	2025-10-09 07:53:42 -07:00
lucidrains	b62c08be65	fix task embed in presence of multiple agent tokens	2025-10-08 08:42:25 -07:00
lucidrains	4c2ed100a3	fix masking for multiple agent tokens 0.0.7	2025-10-08 08:26:44 -07:00
lucidrains	ed0918c974	prepare for evolution within dreams	2025-10-08 08:13:16 -07:00
lucidrains	892654d442	multiple agent tokens sharing the same state	2025-10-08 08:06:13 -07:00
lucidrains	c4e0f46528	for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers	2025-10-08 07:37:34 -07:00
lucidrains	a50e360502	makes more sense for the noise to be fixed	2025-10-08 07:17:05 -07:00
Phil Wang	9c56ba0c9d	Merge pull request #3 from lucidrains/pytest-shard add pytest shard	2025-10-08 07:03:11 -07:00
lucidrains	b5744237bf	fix	2025-10-08 06:58:46 -07:00
lucidrains	63b63dfedd	add shard	2025-10-08 06:56:03 -07:00
lucidrains	612f5f5dd1	a bit of dropout to rewards as state	2025-10-08 06:45:25 -07:00
lucidrains	c8f75caa40	although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state	2025-10-08 06:40:43 -07:00
lucidrains	187edc1414	all set for generating the perceived rewards once the RL components fall into place	2025-10-08 06:33:28 -07:00
lucidrains	f7bdaddbbb	one more incision before knocking out reward decoding	2025-10-08 06:11:02 -07:00
lucidrains	c056835aea	address https://github.com/lucidrains/dreamer4/issues/2 0.0.5	2025-10-08 05:55:22 -07:00
lucidrains	4de357b6c2	tiny change needed to have the world model produce both the video and predicted rewards (after phase 2 finetuning)	2025-10-08 05:52:13 -07:00
lucidrains	0fdb67bafa	add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work 0.0.4	2025-10-07 09:37:37 -07:00
lucidrains	36ccb08500	allow for step_sizes to be passed in, log2 is not that intuitive 0.0.3	2025-10-07 08:36:46 -07:00
lucidrains	a8e14f4b7c	oops	2025-10-07 08:09:33 -07:00
lucidrains	1176269927	correct signal levels when doing teacher forcing generation 0.0.2	2025-10-07 07:41:02 -07:00
lucidrains	c6bef85984	generating video with raw teacher forcing 0.0.1	2025-10-07 07:22:57 -07:00
lucidrains	83ba9a285a	reorganize tokenizer to generate video from the dynamics model	2025-10-06 11:37:45 -07:00
lucidrains	7180a8cf43	start carving into the reinforcement learning portion, starting with reward prediction head (single for now)	2025-10-06 11:17:25 -07:00
lucidrains	77724049e2	fix latent / modality attention pattern in video tokenizer, thanks to another researcher	2025-10-06 09:44:12 -07:00
lucidrains	25b8de91cc	handle spatial tokens less than latent tokens in dynamics model	2025-10-06 09:19:27 -07:00
lucidrains	bfbecb4968	an anonymous researcher pointed out that the video tokenizer may be using multiple latents per frame	2025-10-06 08:16:55 -07:00
lucidrains	338def693d	oops	2025-10-05 11:52:54 -07:00
lucidrains	f507afa0d3	last commit for the day - take care of the task embed	2025-10-05 11:40:48 -07:00
lucidrains	fe99efecba	make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space	2025-10-05 11:17:36 -07:00
lucidrains	971637673b	complete all the types of attention masking patterns as proposed in the paper	2025-10-04 12:45:54 -07:00
lucidrains	5c6be4d979	take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though	2025-10-04 12:03:50 -07:00
lucidrains	6c994db341	first nail down the attention masking for the dynamics transformer model using a factory function	2025-10-04 11:20:57 -07:00
lucidrains	ca700ba8e1	prepare for the learning in dreams	2025-10-04 09:44:46 -07:00
lucidrains	e04f9ffec6	for the temporal attention in dynamics model, do rotary the traditional way	2025-10-04 09:41:36 -07:00
lucidrains	1b7f6e787d	rotate in the 3d rotary embeddings for the video tokenizer for both encoder / decoder	2025-10-04 09:22:06 -07:00
lucidrains	93f6738c9c	given the special attention patterns, attend function needs to be constructed before traversing the transformer layers	2025-10-04 08:31:51 -07:00
lucidrains	7cac3d28c5	cleanup	2025-10-04 08:04:42 -07:00
lucidrains	0f4783f23c	use a newly built module from x-mlps for multi token prediction	2025-10-04 07:56:56 -07:00
lucidrains	0a26e0f92f	complete the lpips loss used for the video tokenizer	2025-10-04 07:47:27 -07:00
Phil Wang	92e55a90b4	temporary discord	2025-10-04 07:28:36 -07:00
lucidrains	85eea216fd	cleanup	2025-10-04 06:59:09 -07:00
lucidrains	895a867a66	able to accept raw video for dynamics model, if tokenizer passed in	2025-10-04 06:57:54 -07:00
lucidrains	8373cb13ec	grouped query attention is necessary	2025-10-04 06:31:32 -07:00

1 2

82 Commits