dreamer4

Author	SHA1	Message	Date
lucidrains	59c458aea3	introduce an `is_truncated` field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78	2025-10-27 07:55:00 -07:00
lucidrains	fbfd59e42f	handle variable lengthed experiences when doing policy optimization 0.0.77	2025-10-27 06:09:09 -07:00
lucidrains	46432aee9b	fix an issue with bc	2025-10-25 12:30:08 -07:00
lucidrains	f97d9adc97	oops, forgot to add the view embedding for robotics 0.0.75	2025-10-25 11:39:06 -07:00
lucidrains	32cf142b4d	take another step for variable len experiences 0.0.74	2025-10-25 11:31:41 -07:00
lucidrains	1ed6a15cb0	fix tests	2025-10-25 11:13:22 -07:00
lucidrains	4d8f5613cc	start storing the experience lens 0.0.73	2025-10-25 10:55:47 -07:00
lucidrains	3d5617d769	take a step towards variable lengthed experiences during training 0.0.72	2025-10-25 10:45:34 -07:00
lucidrains	77a40e8701	validate that we can generate multiple video streams for robotics use-case	2025-10-25 09:23:07 -07:00
lucidrains	4ce82f34df	given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71	2025-10-25 09:20:55 -07:00
lucidrains	a9b728c611	incorporate proprioception into the dynamics world model 0.0.70	2025-10-24 11:24:22 -07:00
lucidrains	35c1db4c7d	sketch of training from sim env 0.0.69	2025-10-24 09:13:09 -07:00
lucidrains	27ac05efb0	function for combining experiences 0.0.67	2025-10-24 08:00:10 -07:00
lucidrains	d0ffc6bfed	with or without signed advantage 0.0.66	2025-10-23 16:24:29 -07:00
lucidrains	fb3e026fe0	handle vectorized env 0.0.65	2025-10-22 11:19:44 -07:00
lucidrains	7ecc5d03e8	wire up the time kv cache when interacting with sim / env 0.0.62	2025-10-22 08:39:11 -07:00
lucidrains	d82debb7a6	first pass through gathering experience with a mock env for online rl 0.0.61	2025-10-22 08:32:46 -07:00
lucidrains	03b16a48f2	sketch out the dream trainer, seems like they only fine tune the heads 0.0.60	2025-10-22 06:41:10 -07:00
lucidrains	6f1a7a24ed	try to fix ci	2025-10-21 11:47:39 -07:00
lucidrains	e316499047	naming	2025-10-21 10:57:55 -07:00
lucidrains	40da985c6b	tweak bc trainer 0.0.59	2025-10-21 10:55:24 -07:00
lucidrains	2fc3b17149	take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57	2025-10-21 10:20:08 -07:00
lucidrains	283d59d75a	oops	2025-10-21 09:50:07 -07:00
lucidrains	4a5465eeb6	fix ci	2025-10-21 09:17:53 -07:00
lucidrains	b34128d3d0	make sure time kv cache can be passed back in during generation 0.0.55	2025-10-21 09:15:32 -07:00
lucidrains	7ba3988fb9	prepare a mock for interacting with online env	2025-10-21 09:03:20 -07:00
lucidrains	ea13d4fcab	take a gradient step with video tokenizer trainer 0.0.54	2025-10-21 08:52:22 -07:00
lucidrains	15876d34cf	more muon prep 0.0.53	2025-10-21 08:23:59 -07:00
lucidrains	b4763caff9	fix rotary embeddings in presence of kv caching	2025-10-21 07:10:21 -07:00
lucidrains	7195bbb196	oops 0.0.50	2025-10-20 12:42:27 -07:00
lucidrains	ca244a290c	first pass through the kv cache for the time block in the dynamics model 0.0.49	2025-10-20 12:25:50 -07:00
lucidrains	a7e0c395c3	allow for only rmsnorm for keys in attention 0.0.48	2025-10-20 11:20:49 -07:00
lucidrains	1345326656	another measure for the attending to nothing issue 0.0.47	2025-10-20 10:32:31 -07:00
lucidrains	55574c054e	assert 0.0.46	2025-10-19 09:59:42 -07:00
lucidrains	27ed6d0ba5	fix time kv cache 0.0.45	2025-10-19 09:16:06 -07:00
lucidrains	4930002e99	bit of progress on time kv cache 0.0.44	2025-10-19 09:04:26 -07:00
lucidrains	ecbe13efe8	allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43	2025-10-19 08:37:56 -07:00
lucidrains	f651d779e3	able to control the update of the loss ema from dynamics model forward 0.0.42	2025-10-19 08:25:50 -07:00
lucidrains	374667d8a9	take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41	2025-10-19 08:24:41 -07:00
lucidrains	79a1b1c46e	oops 0.0.40	2025-10-18 10:31:48 -07:00
lucidrains	b6aa19f31e	complete multi-token prediction for actions, tackle loss balancing another day 0.0.38	2025-10-18 10:23:14 -07:00
lucidrains	bc629d78b1	inverse norm for continuous actions when sampling 0.0.37	2025-10-18 08:55:04 -07:00
lucidrains	0ee475d2df	oops 0.0.36	2025-10-18 08:50:53 -07:00
lucidrains	8c88a33d3b	complete multi token prediction for the reward head 0.0.35	2025-10-18 08:33:06 -07:00
lucidrains	911a1a8434	oops 0.0.34	2025-10-18 08:07:06 -07:00
lucidrains	5fc0022bbf	the function for generating the MTP targets, as well as the mask for the losses	2025-10-18 08:04:51 -07:00
lucidrains	83cfd2cd1b	task conditioning when dreaming 0.0.33	2025-10-18 07:47:13 -07:00
lucidrains	22e13c45fc	rename 0.0.32	2025-10-17 14:44:25 -07:00
lucidrains	c967404471	0.0.31 0.0.31	2025-10-17 08:55:42 -07:00
lucidrains	0c1b067f97	if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon	2025-10-17 08:55:20 -07:00

1 2 3 4

152 Commits