dreamer4

Author	SHA1	Message	Date
lucidrains	7ba3988fb9	prepare a mock for interacting with online env	2025-10-21 09:03:20 -07:00
lucidrains	ea13d4fcab	take a gradient step with video tokenizer trainer 0.0.54	2025-10-21 08:52:22 -07:00
lucidrains	15876d34cf	more muon prep 0.0.53	2025-10-21 08:23:59 -07:00
lucidrains	b4763caff9	fix rotary embeddings in presence of kv caching	2025-10-21 07:10:21 -07:00
lucidrains	7195bbb196	oops 0.0.50	2025-10-20 12:42:27 -07:00
lucidrains	ca244a290c	first pass through the kv cache for the time block in the dynamics model 0.0.49	2025-10-20 12:25:50 -07:00
lucidrains	a7e0c395c3	allow for only rmsnorm for keys in attention 0.0.48	2025-10-20 11:20:49 -07:00
lucidrains	1345326656	another measure for the attending to nothing issue 0.0.47	2025-10-20 10:32:31 -07:00
lucidrains	55574c054e	assert 0.0.46	2025-10-19 09:59:42 -07:00
lucidrains	27ed6d0ba5	fix time kv cache 0.0.45	2025-10-19 09:16:06 -07:00
lucidrains	4930002e99	bit of progress on time kv cache 0.0.44	2025-10-19 09:04:26 -07:00
lucidrains	ecbe13efe8	allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction) 0.0.43	2025-10-19 08:37:56 -07:00
lucidrains	f651d779e3	able to control the update of the loss ema from dynamics model forward 0.0.42	2025-10-19 08:25:50 -07:00
lucidrains	374667d8a9	take care of the loss normalization mentioned at the end of the first paragraph of section 3 0.0.41	2025-10-19 08:24:41 -07:00
lucidrains	79a1b1c46e	oops 0.0.40	2025-10-18 10:31:48 -07:00
lucidrains	b6aa19f31e	complete multi-token prediction for actions, tackle loss balancing another day 0.0.38	2025-10-18 10:23:14 -07:00
lucidrains	bc629d78b1	inverse norm for continuous actions when sampling 0.0.37	2025-10-18 08:55:04 -07:00
lucidrains	0ee475d2df	oops 0.0.36	2025-10-18 08:50:53 -07:00
lucidrains	8c88a33d3b	complete multi token prediction for the reward head 0.0.35	2025-10-18 08:33:06 -07:00
lucidrains	911a1a8434	oops 0.0.34	2025-10-18 08:07:06 -07:00
lucidrains	5fc0022bbf	the function for generating the MTP targets, as well as the mask for the losses	2025-10-18 08:04:51 -07:00
lucidrains	83cfd2cd1b	task conditioning when dreaming 0.0.33	2025-10-18 07:47:13 -07:00
lucidrains	22e13c45fc	rename 0.0.32	2025-10-17 14:44:25 -07:00
lucidrains	c967404471	0.0.31 0.0.31	2025-10-17 08:55:42 -07:00
lucidrains	0c1b067f97	if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon	2025-10-17 08:55:20 -07:00
lucidrains	cb416c0d44	handle the entropies during policy optimization 0.0.30	2025-10-17 08:47:26 -07:00
lucidrains	61773c8219	eventually we will need to learn from the outside stream of experience 0.0.29	2025-10-17 08:06:24 -07:00
lucidrains	0dba734280	start the learning in dreams portion 0.0.27	2025-10-17 08:00:47 -07:00
lucidrains	a0161760a0	extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 0.0.26	2025-10-16 10:40:59 -07:00
lucidrains	2d20d0a6c1	able to roll out actions from one agent within the dreams of a world model 0.0.25	2025-10-16 10:15:43 -07:00
lucidrains	d74f09f0b3	a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 0.0.24	2025-10-16 09:40:14 -07:00
lucidrains	2ccb290e26	pass the attend kwargs for the block causal masking in tokenizer 0.0.23	2025-10-16 08:33:26 -07:00
lucidrains	517ef6b94b	oops 0.0.22	2025-10-16 07:03:51 -07:00
lucidrains	ec18bc0fa4	cleanup	2025-10-16 06:44:28 -07:00
lucidrains	2a902eaaf7	allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it 0.0.21	2025-10-16 06:41:02 -07:00
lucidrains	d28251e9f9	another consideration before knocking out the RL logic 0.0.20	2025-10-14 11:10:26 -07:00
lucidrains	ff81dd761b	separate action and agent embeds 0.0.19	2025-10-13 11:36:21 -07:00
lucidrains	6dbdc3d7d8	correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values 0.0.18	2025-10-12 16:16:18 -07:00
lucidrains	9c78962736	sampling actions 0.0.17	2025-10-12 11:27:12 -07:00
lucidrains	c5e64ff4ce	separate out the key from the value projections in attention for muon 0.0.16	2025-10-12 09:42:22 -07:00
lucidrains	ab5de6795f	bring in muon	2025-10-12 09:35:06 -07:00
lucidrains	8a73a27fc7	add nested tensor way for getting log prob of multiple discrete actions 0.0.15	2025-10-11 10:53:24 -07:00
lucidrains	01bf70e18a	0.0.14 0.0.14	2025-10-11 09:24:58 -07:00
lucidrains	b2725d9b6e	complete behavior cloning for one agent	2025-10-11 09:24:49 -07:00
lucidrains	02558d1f08	will organize the unembedding parameters under the actor optimizer	2025-10-11 06:55:57 -07:00
lucidrains	563b269f8a	bring in hyper connections 0.0.12	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day 0.0.11	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding 0.0.10	2025-10-10 11:27:05 -07:00
lucidrains	c68942b026	cleanup	2025-10-10 10:42:54 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 0.0.9	2025-10-10 10:41:48 -07:00

1 2 3

127 Commits