dreamer4

Author	SHA1	Message	Date
lucidrains	b6aa19f31e	complete multi-token prediction for actions, tackle loss balancing another day 0.0.38	2025-10-18 10:23:14 -07:00
lucidrains	bc629d78b1	inverse norm for continuous actions when sampling 0.0.37	2025-10-18 08:55:04 -07:00
lucidrains	0ee475d2df	oops 0.0.36	2025-10-18 08:50:53 -07:00
lucidrains	8c88a33d3b	complete multi token prediction for the reward head 0.0.35	2025-10-18 08:33:06 -07:00
lucidrains	911a1a8434	oops 0.0.34	2025-10-18 08:07:06 -07:00
lucidrains	5fc0022bbf	the function for generating the MTP targets, as well as the mask for the losses	2025-10-18 08:04:51 -07:00
lucidrains	83cfd2cd1b	task conditioning when dreaming 0.0.33	2025-10-18 07:47:13 -07:00
lucidrains	22e13c45fc	rename 0.0.32	2025-10-17 14:44:25 -07:00
lucidrains	c967404471	0.0.31 0.0.31	2025-10-17 08:55:42 -07:00
lucidrains	0c1b067f97	if optimizer is passed into the learn from dreams function, take the optimizer steps, otherwise let the researcher handle it externally. also ready muon	2025-10-17 08:55:20 -07:00
lucidrains	cb416c0d44	handle the entropies during policy optimization 0.0.30	2025-10-17 08:47:26 -07:00
lucidrains	61773c8219	eventually we will need to learn from the outside stream of experience 0.0.29	2025-10-17 08:06:24 -07:00
lucidrains	0dba734280	start the learning in dreams portion 0.0.27	2025-10-17 08:00:47 -07:00
lucidrains	a0161760a0	extract the log probs and predicted values (symexp two hot encoded) for the phase 3 RL training 0.0.26	2025-10-16 10:40:59 -07:00
lucidrains	2d20d0a6c1	able to roll out actions from one agent within the dreams of a world model 0.0.25	2025-10-16 10:15:43 -07:00
lucidrains	d74f09f0b3	a researcher in discord pointed out that the tokenizer also uses the axial space time transformer. redo without the 3d rotary and block causal, greatly simplifying the implementation 0.0.24	2025-10-16 09:40:14 -07:00
lucidrains	2ccb290e26	pass the attend kwargs for the block causal masking in tokenizer 0.0.23	2025-10-16 08:33:26 -07:00
lucidrains	517ef6b94b	oops 0.0.22	2025-10-16 07:03:51 -07:00
lucidrains	ec18bc0fa4	cleanup	2025-10-16 06:44:28 -07:00
lucidrains	2a902eaaf7	allow reward tokens to be attended to as state optionally, DT-esque. figure out multi-agent scenario once i get around to it 0.0.21	2025-10-16 06:41:02 -07:00
lucidrains	d28251e9f9	another consideration before knocking out the RL logic 0.0.20	2025-10-14 11:10:26 -07:00
lucidrains	ff81dd761b	separate action and agent embeds 0.0.19	2025-10-13 11:36:21 -07:00
lucidrains	6dbdc3d7d8	correct a misunderstanding where past actions is a separate action token, while agent token is used for the prediction of next action, rewards, values 0.0.18	2025-10-12 16:16:18 -07:00
lucidrains	9c78962736	sampling actions 0.0.17	2025-10-12 11:27:12 -07:00
lucidrains	c5e64ff4ce	separate out the key from the value projections in attention for muon 0.0.16	2025-10-12 09:42:22 -07:00
lucidrains	ab5de6795f	bring in muon	2025-10-12 09:35:06 -07:00
lucidrains	8a73a27fc7	add nested tensor way for getting log prob of multiple discrete actions 0.0.15	2025-10-11 10:53:24 -07:00
lucidrains	01bf70e18a	0.0.14 0.0.14	2025-10-11 09:24:58 -07:00
lucidrains	b2725d9b6e	complete behavior cloning for one agent	2025-10-11 09:24:49 -07:00
lucidrains	02558d1f08	will organize the unembedding parameters under the actor optimizer	2025-10-11 06:55:57 -07:00
lucidrains	563b269f8a	bring in hyper connections 0.0.12	2025-10-11 06:52:57 -07:00
lucidrains	5df3e69583	last commit for the day 0.0.11	2025-10-10 11:59:18 -07:00
lucidrains	9230267d34	handle subset of discrete action unembedding 0.0.10	2025-10-10 11:27:05 -07:00
lucidrains	c68942b026	cleanup	2025-10-10 10:42:54 -07:00
lucidrains	32aa355e37	prepare unembedding parameters in ActionEmbedder as well as the policy head, to allow for behavioral cloning before RL 0.0.9	2025-10-10 10:41:48 -07:00
lucidrains	9101a49cdd	handle continuous value normalization if stats passed in	2025-10-09 08:59:54 -07:00
lucidrains	31f4363be7	must be able to do phase1 and phase2 training	2025-10-09 08:04:36 -07:00
lucidrains	e2d86a4543	add a complete action embedder that can accept any number of discrete actions with variable bins as well as any number of continuous actions, pooled and added to the agent token as described in the paper (seems like they fixed that horrendous hack in dreamer v3 with sticky action) 0.0.8	2025-10-09 07:53:42 -07:00
lucidrains	b62c08be65	fix task embed in presence of multiple agent tokens	2025-10-08 08:42:25 -07:00
lucidrains	4c2ed100a3	fix masking for multiple agent tokens 0.0.7	2025-10-08 08:26:44 -07:00
lucidrains	ed0918c974	prepare for evolution within dreams	2025-10-08 08:13:16 -07:00
lucidrains	892654d442	multiple agent tokens sharing the same state	2025-10-08 08:06:13 -07:00
lucidrains	c4e0f46528	for the value head, we will go for symexp encoding as well (following the "stop regressing" paper from Farebrother et al), also use layernormed mlp given recent papers	2025-10-08 07:37:34 -07:00
lucidrains	a50e360502	makes more sense for the noise to be fixed	2025-10-08 07:17:05 -07:00
Phil Wang	9c56ba0c9d	Merge pull request #3 from lucidrains/pytest-shard add pytest shard	2025-10-08 07:03:11 -07:00
lucidrains	b5744237bf	fix	2025-10-08 06:58:46 -07:00
lucidrains	63b63dfedd	add shard	2025-10-08 06:56:03 -07:00
lucidrains	612f5f5dd1	a bit of dropout to rewards as state	2025-10-08 06:45:25 -07:00
lucidrains	c8f75caa40	although not in the paper, it would be interesting for each agent (will extend to multi-agent) to consider its own past rewards as state	2025-10-08 06:40:43 -07:00
lucidrains	187edc1414	all set for generating the perceived rewards once the RL components fall into place	2025-10-08 06:33:28 -07:00

1 2 3

112 Commits