dreamer4

Author	SHA1	Message	Date
lucidrains	7ba3988fb9	prepare a mock for interacting with online env	2025-10-21 09:03:20 -07:00
lucidrains	ea13d4fcab	take a gradient step with video tokenizer trainer	2025-10-21 08:52:22 -07:00
lucidrains	15876d34cf	more muon prep	2025-10-21 08:23:59 -07:00
lucidrains	b4763caff9	fix rotary embeddings in presence of kv caching	2025-10-21 07:10:21 -07:00
lucidrains	7195bbb196	oops	2025-10-20 12:42:27 -07:00
lucidrains	ca244a290c	first pass through the kv cache for the time block in the dynamics model	2025-10-20 12:25:50 -07:00
lucidrains	a7e0c395c3	allow for only rmsnorm for keys in attention	2025-10-20 11:20:49 -07:00
lucidrains	1345326656	another measure for the attending to nothing issue	2025-10-20 10:32:31 -07:00
lucidrains	55574c054e	assert	2025-10-19 09:59:42 -07:00
lucidrains	27ed6d0ba5	fix time kv cache	2025-10-19 09:16:06 -07:00
lucidrains	4930002e99	bit of progress on time kv cache	2025-10-19 09:04:26 -07:00
lucidrains	ecbe13efe8	allow for setting different loss weights for each MTP head (perhaps more weight on the next vs some far out prediction)	2025-10-19 08:37:56 -07:00
lucidrains	f651d779e3	able to control the update of the loss ema from dynamics model forward	2025-10-19 08:25:50 -07:00
lucidrains	374667d8a9	take care of the loss normalization mentioned at the end of the first paragraph of section 3	2025-10-19 08:24:41 -07:00
lucidrains	79a1b1c46e	oops	2025-10-18 10:31:48 -07:00
lucidrains	b6aa19f31e	complete multi-token prediction for actions, tackle loss balancing another day	2025-10-18 10:23:14 -07:00
lucidrains	bc629d78b1	inverse norm for continuous actions when sampling	2025-10-18 08:55:04 -07:00
lucidrains	0ee475d2df	oops	2025-10-18 08:50:53 -07:00
lucidrains	8c88a33d3b	complete multi token prediction for the reward head	2025-10-18 08:33:06 -07:00
lucidrains	911a1a8434	oops	2025-10-18 08:07:06 -07:00
lucidrains	83cfd2cd1b	task conditioning when dreaming	2025-10-18 07:47:13 -07:00
lucidrains	22e13c45fc	rename	2025-10-17 14:44:25 -07:00
lucidrains	c967404471	0.0.31	2025-10-17 08:55:42 -07:00
lucidrains	cb416c0d44	handle the entropies during policy optimization	2025-10-17 08:47:26 -07:00
lucidrains	61773c8219	eventually we will need to learn from the outside stream of experience	2025-10-17 08:06:24 -07:00
lucidrains	0dba734280	start the learning in dreams portion	2025-10-17 08:00:47 -07:00
lucidrains	a0161760a0	extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training	2025-10-16 10:40:59 -07:00
lucidrains	2d20d0a6c1	able to roll out actions from one agent within the dreams of a world model	2025-10-16 10:15:43 -07:00
lucidrains	d74f09f0b3	a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation	2025-10-16 09:40:14 -07:00
lucidrains	2ccb290e26	pass the attend kwargs for the block causal masking in tokenizer	2025-10-16 08:33:26 -07:00
lucidrains	517ef6b94b	oops	2025-10-16 07:03:51 -07:00
lucidrains	2a902eaaf7	allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it	2025-10-16 06:41:02 -07:00
lucidrains	d28251e9f9	another consideration before knocking out the RL logic	2025-10-14 11:10:26 -07:00
lucidrains	ff81dd761b	separate action and agent embeds	2025-10-13 11:36:21 -07:00
lucidrains	6dbdc3d7d8	correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values	2025-10-12 16:16:18 -07:00
lucidrains	9c78962736	sampling actions	2025-10-12 11:27:12 -07:00
lucidrains	c5e64ff4ce	separate out the key from the value projections in attention for muon	2025-10-12 09:42:22 -07:00
lucidrains	ab5de6795f	bring in muon	2025-10-12 09:35:06 -07:00
lucidrains	8a73a27fc7	add nested tensor way for getting log prob of multiple discrete actions	2025-10-11 10:53:24 -07:00
lucidrains	01bf70e18a	0.0.14	2025-10-11 09:24:58 -07:00
lucidrains	563b269f8a	bring in hyper connections	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding	2025-10-10 11:27:05 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL	2025-10-10 10:41:48 -07:00
lucidrains	e2d86a4543	add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action)	2025-10-09 07:53:42 -07:00
lucidrains	4c2ed100a3	fix masking for multiple agent tokens	2025-10-08 08:26:44 -07:00
lucidrains	63b63dfedd	add shard	2025-10-08 06:56:03 -07:00
lucidrains	187edc1414	all set for generating the perceived rewards once the RL components fall into place	2025-10-08 06:33:28 -07:00
lucidrains	c056835aea	address https://github.com/lucidrains/dreamer4/issues/2	2025-10-08 05:55:22 -07:00
lucidrains	0fdb67bafa	add the noising of the latent context during generation, technique i think was from EPFL, or perhaps some google group that built on top of EPFL work	2025-10-07 09:37:37 -07:00

1 2

60 Commits