dreamer4

Author	SHA1	Message	Date
lucidrains	35c1db4c7d	sketch of training from sim env	2025-10-24 09:13:09 -07:00
lucidrains	27ac05efb0	function for combining experiences	2025-10-24 08:00:10 -07:00
lucidrains	d0ffc6bfed	with or without signed advantage	2025-10-23 16:24:29 -07:00
lucidrains	fb3e026fe0	handle vectorized env	2025-10-22 11:19:44 -07:00
lucidrains	d82debb7a6	first pass through gathering experience with a mock env for online rl	2025-10-22 08:32:46 -07:00
lucidrains	03b16a48f2	sketch out the dream trainer, seems like they only fine tune the heads	2025-10-22 06:41:10 -07:00
lucidrains	6f1a7a24ed	try to fix ci	2025-10-21 11:47:39 -07:00
lucidrains	2fc3b17149	take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards	2025-10-21 10:20:08 -07:00
lucidrains	283d59d75a	oops	2025-10-21 09:50:07 -07:00
lucidrains	4a5465eeb6	fix ci	2025-10-21 09:17:53 -07:00
lucidrains	b34128d3d0	make sure time kv cache can be passed back in during generation	2025-10-21 09:15:32 -07:00
lucidrains	ea13d4fcab	take a gradient step with video tokenizer trainer	2025-10-21 08:52:22 -07:00
lucidrains	ca244a290c	first pass through the kv cache for the time block in the dynamics model	2025-10-20 12:25:50 -07:00
lucidrains	374667d8a9	take care of the loss normalization mentioned at the end of the first paragraph of section 3	2025-10-19 08:24:41 -07:00
lucidrains	b6aa19f31e	complete multi-token prediction for actions, tackle loss balancing another day	2025-10-18 10:23:14 -07:00
lucidrains	5fc0022bbf	the function for generating the MTP targets, as well as the mask for the losses	2025-10-18 08:04:51 -07:00
lucidrains	22e13c45fc	rename	2025-10-17 14:44:25 -07:00
lucidrains	cb416c0d44	handle the entropies during policy optimization	2025-10-17 08:47:26 -07:00
lucidrains	0dba734280	start the learning in dreams portion	2025-10-17 08:00:47 -07:00
lucidrains	a0161760a0	extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training	2025-10-16 10:40:59 -07:00
lucidrains	2d20d0a6c1	able to roll out actions from one agent within the dreams of a world model	2025-10-16 10:15:43 -07:00
lucidrains	d28251e9f9	another consideration before knocking out the RL logic	2025-10-14 11:10:26 -07:00
lucidrains	9c78962736	sampling actions	2025-10-12 11:27:12 -07:00
lucidrains	8a73a27fc7	add nested tensor way for getting log prob of multiple discrete actions	2025-10-11 10:53:24 -07:00
lucidrains	b2725d9b6e	complete behavior cloning for one agent	2025-10-11 09:24:49 -07:00
lucidrains	563b269f8a	bring in hyper connections	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding	2025-10-10 11:27:05 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL	2025-10-10 10:41:48 -07:00
lucidrains	9101a49cdd	handle continuous value normalization if stats passed in	2025-10-09 08:59:54 -07:00
lucidrains	31f4363be7	must be able to do phase1 and phase2 training	2025-10-09 08:04:36 -07:00
lucidrains	e2d86a4543	add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)	2025-10-09 07:53:42 -07:00
lucidrains	c4e0f46528	for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers	2025-10-08 07:37:34 -07:00
lucidrains	187edc1414	all set for generating the perceived rewards once the RL components fall into place	2025-10-08 06:33:28 -07:00
lucidrains	36ccb08500	allow for step_sizes to be passed in, log2 is not that intuitive	2025-10-07 08:36:46 -07:00
lucidrains	a8e14f4b7c	oops	2025-10-07 08:09:33 -07:00
lucidrains	c6bef85984	generating video with raw teacher forcing	2025-10-07 07:22:57 -07:00
lucidrains	83ba9a285a	reorganize tokenizer to generate video from the dynamics model	2025-10-06 11:37:45 -07:00
lucidrains	7180a8cf43	start carving into the reinforcement learning portion, starting with reward prediction head (single for now)	2025-10-06 11:17:25 -07:00
lucidrains	25b8de91cc	handle spatial tokens less than latent tokens in dynamics model	2025-10-06 09:19:27 -07:00
lucidrains	f507afa0d3	last commit for the day - take care of the task embed	2025-10-05 11:40:48 -07:00
lucidrains	fe99efecba	make a first pass through the shortcut training logic (Frans et al from Berkeley) maintaining both v-space and x-space	2025-10-05 11:17:36 -07:00
lucidrains	971637673b	complete all the types of attention masking patterns as proposed in the paper	2025-10-04 12:45:54 -07:00
lucidrains	5c6be4d979	take care of blocked causal in video tokenizer, still need the special attention pattern for latents to and from though	2025-10-04 12:03:50 -07:00
lucidrains	6c994db341	first nail down the attention masking for the dynamics transformer model using a factory function	2025-10-04 11:20:57 -07:00
lucidrains	895a867a66	able to accept raw video for dynamics model, if tokenizer passed in	2025-10-04 06:57:54 -07:00
lucidrains	8373cb13ec	grouped query attention is necessary	2025-10-04 06:31:32 -07:00
lucidrains	046f8927d1	complete the symexp two hot proposed by Hafner from the previous versions of Dreamer, but will also bring in hl gauss	2025-10-03 08:08:44 -07:00
lucidrains	8d1cd311bb	Revert "address https://github.com/lucidrains/dreamer4/issues/1 " This reverts commit e23a5294ec2f49d58d3ccb936c498eb86939fa96.	2025-10-02 12:25:05 -07:00
lucidrains	e23a5294ec	address https://github.com/lucidrains/dreamer4/issues/1	2025-10-02 11:49:22 -07:00

1 2

57 Commits