dreamer4

Author	SHA1	Message	Date
lucidrains	fb6d69f43a	complete the latent autoregressive prediction, to use the log variance as a state entropy bonus 0.1.21	2025-12-03 06:40:19 -08:00
lucidrains	125693ce1c	add a separate state prediction head for the state entropy	2025-12-02 15:58:25 -08:00
lucidrains	2e7f406d49	allow for the combining of experiences from environment and dream 0.1.20	2025-11-13 16:37:35 -08:00
lucidrains	690ecf07dc	fix the rnn time caching issue 0.1.19	2025-11-11 17:04:02 -08:00
lucidrains	ac1c12f743	disable until rnn hiddens are handled properly 0.1.18	2025-11-10 15:52:43 -08:00
lucidrains	3c84b404a8	rnn layer needs to be hyper connected too 0.1.17	2025-11-10 15:51:33 -08:00
lucidrains	d5b70e2b86	allow for adding an RNN before time attention, but need to handle caching still 0.1.16	2025-11-10 11:42:20 -08:00
lucidrains	c3532fa797	add learned value residual 0.1.15	2025-11-10 09:33:58 -08:00
lucidrains	73029635fe	last commit for the day 0.1.12	2025-11-09 11:12:37 -08:00
lucidrains	e1c41f4371	decorrelation loss for spatial attention as well 0.1.10	2025-11-09 10:41:58 -08:00
Phil Wang	f55c61c6cf	cleanup	2025-11-09 10:22:47 -08:00
lucidrains	051d4d6ee2	oops 0.1.8	2025-11-09 10:12:51 -08:00
lucidrains	ef3a5552e7	eventually video tokenizer may need to be trained on single frames 0.1.7	2025-11-09 10:11:56 -08:00
lucidrains	0c4224da18	add a decorrelation loss for temporal attention in encoder of video tokenizer 0.1.6	2025-11-09 09:47:47 -08:00
Phil Wang	256a81f658	Merge pull request #5 from Cycl0/patch-1 Update Discord channel link in README to use permanent link	2025-11-09 08:17:41 -08:00
lucidrains	cfd34f1eba	able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it 0.1.5	2025-11-09 16:16:13 +00:00
Lucas Kenzo Cyra	4ffbe37873	Update Discord channel link in README to use permanent link Updated Discord channel link for collaboration.	2025-11-09 10:12:45 -03:00
lucidrains	24ef72d528	0.1.4 0.1.4	2025-11-04 15:21:20 -08:00
Phil Wang	a4afcb22a6	Merge pull request #4 from dirkmcpherson/bugfix fix a few typo bugs. Support info in return signature of environment …	2025-11-04 15:19:25 -08:00
j	b0f6b8583d	fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug.	2025-11-04 17:29:12 -05:00
lucidrains	38cba80068	readme	2025-11-04 06:05:11 -08:00
lucidrains	c0a6cd56a1	link to new discord	2025-10-31 09:06:44 -07:00
lucidrains	d756d1bb8c	addressing issues raised by an independent researcher with llm assistance 0.1.2	2025-10-31 08:37:39 -07:00
lucidrains	60681fce1d	fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme	2025-10-31 06:48:49 -07:00
Phil Wang	6870294d95	no longer needed	2025-10-30 09:23:27 -07:00
lucidrains	3beae186da	some more control over whether to normalize advantages 0.0.102	2025-10-30 08:46:03 -07:00
lucidrains	0904e224ab	make the reverse kl optional 0.0.101	2025-10-30 08:22:50 -07:00
lucidrains	767789d0ca	they decided on 0.3 for the behavioral prior loss weight 0.0.100	2025-10-29 13:24:58 -07:00
lucidrains	35b87c4fa1	oops 0.0.99	2025-10-29 13:04:02 -07:00
lucidrains	c4a3cb09d5	swap for discrete kl div, thanks to Dirk for pointing this out on the discord 0.0.98	2025-10-29 11:54:18 -07:00
lucidrains	cb54121ace	sim trainer needs to take care of agent embedding and old actions 0.0.96	2025-10-29 11:15:11 -07:00
lucidrains	586379f2c8	sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95	2025-10-29 10:46:42 -07:00
lucidrains	a358a44a53	always store old agent embeds and old action parameters when possible 0.0.94	2025-10-29 10:39:15 -07:00
lucidrains	3547344312	take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93	2025-10-29 10:31:32 -07:00
lucidrains	691d9ca007	add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92	2025-10-29 10:02:25 -07:00
lucidrains	91d697f8ca	fix pmpo 0.0.91	2025-10-28 18:55:22 -07:00
lucidrains	7acaa764f6	evolutionary policy optimization on dreams will be interesting 0.0.90	2025-10-28 10:17:01 -07:00
lucidrains	c0450359f3	allow for evolutionary policy optimization 0.0.89	2025-10-28 10:11:13 -07:00
lucidrains	46f86cd247	fix storing of agent embedding 0.0.88	2025-10-28 09:36:58 -07:00
lucidrains	903c43b770	use the agent embeds off the stored experience if available 0.0.87	2025-10-28 09:14:02 -07:00
lucidrains	d476fa7b14	able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85	2025-10-28 09:02:26 -07:00
lucidrains	789f091c63	redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83	2025-10-28 08:04:48 -07:00
lucidrains	41ab83f691	fix mock	2025-10-27 10:47:24 -07:00
lucidrains	995b1f64e5	handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82	2025-10-27 10:14:28 -07:00
lucidrains	fd1e87983b	quantile filter	2025-10-27 09:08:26 -07:00
lucidrains	fe79bfa951	optionally keep track of returns statistics and normalize with them before advantage 0.0.81	2025-10-27 09:02:08 -07:00
lucidrains	f808b1c1d2	oops 0.0.80	2025-10-27 08:34:22 -07:00
lucidrains	349a03acd7	redo so `lens` is always the episode length, including the bootstrap value timestep, and use `is_truncated` to mask out the bootstrap node from being learned on 0.0.79	2025-10-27 08:06:21 -07:00
lucidrains	59c458aea3	introduce an `is_truncated` field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78	2025-10-27 07:55:00 -07:00
lucidrains	fbfd59e42f	handle variable lengthed experiences when doing policy optimization 0.0.77	2025-10-27 06:09:09 -07:00

1 2 3 4

200 Commits