dreamer4

hongshaorou/dreamer4

Fork 0

Commit Graph

Select branches

Hide Pull Requests

main

pytest-shard

#3

#4

#5

0.0.1

0.0.10

0.0.100

0.0.101

0.0.102

0.0.11

0.0.12

0.0.14

0.0.15

0.0.16

0.0.17

0.0.18

0.0.19

0.0.2

0.0.20

0.0.21

0.0.22

0.0.23

0.0.24

0.0.25

0.0.26

0.0.27

0.0.28

0.0.29

0.0.3

0.0.30

0.0.31

0.0.32

0.0.33

0.0.34

0.0.35

0.0.36

0.0.37

0.0.38

0.0.39

0.0.4

0.0.40

0.0.41

0.0.42

0.0.43

0.0.44

0.0.45

0.0.46

0.0.47

0.0.48

0.0.49

0.0.5

0.0.50

0.0.52

0.0.53

0.0.54

0.0.55

0.0.56

0.0.57

0.0.58

0.0.59

0.0.60

0.0.61

0.0.62

0.0.64

0.0.65

0.0.66

0.0.67

0.0.68

0.0.69

0.0.7

0.0.70

0.0.71

0.0.72

0.0.73

0.0.74

0.0.75

0.0.76

0.0.77

0.0.78

0.0.79

0.0.8

0.0.80

0.0.81

0.0.82

0.0.83

0.0.84

0.0.85

0.0.87

0.0.88

0.0.89

0.0.9

0.0.90

0.0.91

0.0.92

0.0.93

0.0.94

0.0.95

0.0.96

0.0.97

0.0.98

0.0.99

0.1.0

0.1.1

0.1.10

0.1.11

0.1.12

0.1.14

0.1.15

0.1.16

0.1.17

0.1.18

0.1.19

0.1.2

0.1.20

0.1.21

0.1.22

0.1.23

0.1.24

0.1.4

0.1.5

0.1.6

0.1.7

0.1.8

5bb027b386 allow for image pretraining on video tokenizer main 0.1.24 lucidrains 2025-12-04 10:34:15 -08:00
9efe269688 oops 0.1.23 lucidrains 2025-12-03 08:11:47 -08:00
fb8c3793b4 complete the addition of a state entropy bonus lucidrains 2025-12-03 07:52:30 -08:00
eb7a13502e complete the addition of a state entropy bonus 0.1.22 lucidrains 2025-12-03 07:51:04 -08:00
fb6d69f43a complete the latent autoregressive prediction, to use the log variance as a state entropy bonus 0.1.21 lucidrains 2025-12-03 06:40:19 -08:00
125693ce1c add a separate state prediction head for the state entropy lucidrains 2025-12-02 15:58:25 -08:00
2e7f406d49 allow for the combining of experiences from environment and dream 0.1.20 lucidrains 2025-11-13 16:37:35 -08:00
690ecf07dc fix the rnn time caching issue 0.1.19 lucidrains 2025-11-11 17:04:02 -08:00
ac1c12f743 disable until rnn hiddens are handled properly 0.1.18 lucidrains 2025-11-10 15:52:43 -08:00
3c84b404a8 rnn layer needs to be hyper connected too 0.1.17 lucidrains 2025-11-10 15:51:33 -08:00
d5b70e2b86 allow for adding an RNN before time attention, but need to handle caching still 0.1.16 lucidrains 2025-11-10 11:42:20 -08:00
c3532fa797 add learned value residual 0.1.15 lucidrains 2025-11-10 09:33:58 -08:00
5e75c4029d add learned value residual 0.1.14 lucidrains 2025-11-10 09:16:29 -08:00
73029635fe last commit for the day 0.1.12 lucidrains 2025-11-09 11:12:37 -08:00
dfe15a0605 last commit for the day 0.1.11 lucidrains 2025-11-09 11:03:03 -08:00
e1c41f4371 decorrelation loss for spatial attention as well 0.1.10 lucidrains 2025-11-09 10:41:54 -08:00
f55c61c6cf

cleanup Phil Wang 2025-11-09 10:22:47 -08:00
051d4d6ee2 oops 0.1.8 lucidrains 2025-11-09 10:12:51 -08:00
ef3a5552e7 eventually video tokenizer may need to be trained on single frames 0.1.7 lucidrains 2025-11-09 10:11:56 -08:00
0c4224da18 add a decorrelation loss for temporal attention in encoder of video tokenizer 0.1.6 lucidrains 2025-11-09 09:47:33 -08:00
256a81f658

Merge pull request #5 from Cycl0/patch-1 Phil Wang 2025-11-09 08:17:41 -08:00
cfd34f1eba able to move the experience to cpu easily, and auto matically move it to the device of the dynamics world model when learning from it 0.1.5 lucidrains 2025-11-09 16:16:13 +00:00
4ffbe37873

Update Discord channel link in README to use permanent link Lucas Kenzo Cyra 2025-11-09 10:12:45 -03:00
24ef72d528 0.1.4 0.1.4 lucidrains 2025-11-04 15:21:20 -08:00
a4afcb22a6

Merge pull request #4 from dirkmcpherson/bugfix Phil Wang 2025-11-04 15:19:25 -08:00
b0f6b8583d fix a few typo bugs. Support info in return signature of environment step. Temporarily turn off flex attention when the kv_cache is used to avoid bug. j 2025-11-04 17:29:12 -05:00
38cba80068 readme lucidrains 2025-11-04 06:05:11 -08:00
c0a6cd56a1 link to new discord lucidrains 2025-10-31 09:06:44 -07:00
d756d1bb8c addressing issues raised by an independent researcher with llm assistance 0.1.2 lucidrains 2025-10-31 08:37:39 -07:00
ef367969f8 addressing issues raised by an independent researcher with llm assistance 0.1.1 lucidrains 2025-10-31 08:26:33 -07:00
60681fce1d fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme lucidrains 2025-10-31 06:48:49 -07:00
a0bda62989 fix generation so that one more step is taken to decode agent embeds off the final cleaned set of latents, update readme 0.1.0 lucidrains 2025-10-31 06:47:35 -07:00
6870294d95

no longer needed Phil Wang 2025-10-30 09:23:27 -07:00
3beae186da some more control over whether to normalize advantages 0.0.102 lucidrains 2025-10-30 08:46:03 -07:00
0904e224ab make the reverse kl optional 0.0.101 lucidrains 2025-10-30 08:22:50 -07:00
767789d0ca they decided on 0.3 for the behavioral prior loss weight 0.0.100 lucidrains 2025-10-29 13:24:58 -07:00
35b87c4fa1 oops 0.0.99 lucidrains 2025-10-29 13:04:02 -07:00
c4a3cb09d5 swap for discrete kl div, thanks to Dirk for pointing this out on the discord 0.0.98 lucidrains 2025-10-29 11:54:18 -07:00
4b06615018 swap src and tgt for kl div 0.0.97 lucidrains 2025-10-29 11:23:34 -07:00
cb54121ace sim trainer needs to take care of agent embedding and old actions 0.0.96 lucidrains 2025-10-29 11:15:11 -07:00
586379f2c8 sum the kl div loss across number of actions by default for action embedder .kl_div 0.0.95 lucidrains 2025-10-29 10:46:42 -07:00
a358a44a53 always store old agent embeds and old action parameters when possible 0.0.94 lucidrains 2025-10-29 10:39:15 -07:00
3547344312 take care of storing the old action logits and mean log var, and calculate kl div for pmpo based off that during learn from experience 0.0.93 lucidrains 2025-10-29 10:31:32 -07:00
691d9ca007 add kl div on action embedder, working way towards the kl div loss in pmpo 0.0.92 lucidrains 2025-10-29 10:02:25 -07:00
91d697f8ca fix pmpo 0.0.91 lucidrains 2025-10-28 18:55:22 -07:00
7acaa764f6 evolutionary policy optimization on dreams will be interesting 0.0.90 lucidrains 2025-10-28 10:17:01 -07:00
c0450359f3 allow for evolutionary policy optimization 0.0.89 lucidrains 2025-10-28 10:11:13 -07:00
46f86cd247 fix storing of agent embedding 0.0.88 lucidrains 2025-10-28 09:36:58 -07:00
903c43b770 use the agent embeds off the stored experience if available 0.0.87 lucidrains 2025-10-28 09:14:02 -07:00
d476fa7b14 able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.85 lucidrains 2025-10-28 09:02:26 -07:00
b02abc7a8a able to store the agent embeddings during rollouts with imagination or environment, for efficient policy optimization (but will also allow for finetuning world model for the heads) 0.0.84 lucidrains 2025-10-28 09:01:29 -07:00
789f091c63 redo so that max timesteps is treated as truncation at the last timestep, then allow for accepting the truncation signal from the environment and reuse same logic 0.0.83 lucidrains 2025-10-28 08:04:48 -07:00
41ab83f691 fix mock lucidrains 2025-10-27 10:47:24 -07:00
995b1f64e5 handle environments that return a terminate flag, also make sure episode lens are logged in vectorized env 0.0.82 lucidrains 2025-10-27 10:14:28 -07:00
fd1e87983b quantile filter lucidrains 2025-10-27 09:08:26 -07:00
fe79bfa951 optionally keep track of returns statistics and normalize with them before advantage 0.0.81 lucidrains 2025-10-27 09:02:08 -07:00
f808b1c1d2 oops 0.0.80 lucidrains 2025-10-27 08:34:22 -07:00
349a03acd7 redo so lens is always the episode length, including the bootstrap value timestep, and use is_truncated to mask out the bootstrap node from being learned on 0.0.79 lucidrains 2025-10-27 08:06:21 -07:00
59c458aea3 introduce an is_truncated field on Experience, and mask out rewards and values before calculating gae appropriately 0.0.78 lucidrains 2025-10-27 07:55:00 -07:00
fbfd59e42f handle variable lengthed experiences when doing policy optimization 0.0.77 lucidrains 2025-10-27 06:09:09 -07:00
46432aee9b fix an issue with bc lucidrains 2025-10-25 12:30:08 -07:00
cf7c237334 fix an issue with bc 0.0.76 lucidrains 2025-10-25 12:27:28 -07:00
f97d9adc97 oops, forgot to add the view embedding for robotics 0.0.75 lucidrains 2025-10-25 11:39:06 -07:00
32cf142b4d take another step for variable len experiences 0.0.74 lucidrains 2025-10-25 11:31:41 -07:00
1ed6a15cb0 fix tests lucidrains 2025-10-25 11:13:22 -07:00
4d8f5613cc start storing the experience lens 0.0.73 lucidrains 2025-10-25 10:55:47 -07:00
3d5617d769 take a step towards variable lengthed experiences during training 0.0.72 lucidrains 2025-10-25 10:45:34 -07:00
77a40e8701 validate that we can generate multiple video streams for robotics use-case lucidrains 2025-10-25 09:23:07 -07:00
4ce82f34df given the VAT paper, add multiple video streams (third person, wrist camera, etc), geared for robotics. need to manage an extra dimension for multiple viewpoints 0.0.71 lucidrains 2025-10-25 09:20:55 -07:00
a9b728c611 incorporate proprioception into the dynamics world model 0.0.70 lucidrains 2025-10-24 11:24:22 -07:00
35c1db4c7d sketch of training from sim env 0.0.69 lucidrains 2025-10-24 09:13:09 -07:00
8526347316 sketch of training from sim env 0.0.68 lucidrains 2025-10-24 08:56:51 -07:00
27ac05efb0 function for combining experiences 0.0.67 lucidrains 2025-10-24 08:00:10 -07:00
d0ffc6bfed with or without signed advantage 0.0.66 lucidrains 2025-10-23 16:24:29 -07:00
fb3e026fe0 handle vectorized env 0.0.65 lucidrains 2025-10-22 11:19:44 -07:00
e4ee4d905a handle vectorized env 0.0.64 lucidrains 2025-10-22 08:52:08 -07:00
7ecc5d03e8 wire up the time kv cache when interacting with sim / env 0.0.62 lucidrains 2025-10-22 08:39:11 -07:00
d82debb7a6 first pass through gathering experience with a mock env for online rl 0.0.61 lucidrains 2025-10-22 08:32:46 -07:00
03b16a48f2 sketch out the dream trainer, seems like they only fine tune the heads 0.0.60 lucidrains 2025-10-22 06:41:10 -07:00
6f1a7a24ed try to fix ci lucidrains 2025-10-21 11:47:39 -07:00
e316499047 naming lucidrains 2025-10-21 10:57:55 -07:00
40da985c6b tweak bc trainer 0.0.59 lucidrains 2025-10-21 10:55:24 -07:00
e280592510 tweak bc trainer 0.0.58 lucidrains 2025-10-21 10:54:47 -07:00
2fc3b17149 take a gradient step with behavioral clone trainer, make sure it works with and without actions and rewards 0.0.57 lucidrains 2025-10-21 10:20:08 -07:00
283d59d75a oops lucidrains 2025-10-21 09:50:07 -07:00
e7d9766608 oops 0.0.56 lucidrains 2025-10-21 09:46:49 -07:00
4a5465eeb6 fix ci lucidrains 2025-10-21 09:17:53 -07:00
b34128d3d0 make sure time kv cache can be passed back in during generation 0.0.55 lucidrains 2025-10-21 09:15:32 -07:00
7ba3988fb9 prepare a mock for interacting with online env lucidrains 2025-10-21 09:03:20 -07:00
ea13d4fcab take a gradient step with video tokenizer trainer 0.0.54 lucidrains 2025-10-21 08:52:22 -07:00
15876d34cf more muon prep 0.0.53 lucidrains 2025-10-21 08:23:59 -07:00
b4763caff9 fix rotary embeddings in presence of kv caching lucidrains 2025-10-21 07:09:26 -07:00
11fd2c477c fix rotary embeddings in prsence of kv caching 0.0.52 lucidrains 2025-10-21 07:09:26 -07:00
7195bbb196 oops 0.0.50 lucidrains 2025-10-20 12:42:27 -07:00
ca244a290c first pass through the kv cache for the time block in the dynamics model 0.0.49 lucidrains 2025-10-20 12:25:50 -07:00
a7e0c395c3 allow for only rmsnorm for keys in attention 0.0.48 lucidrains 2025-10-20 11:20:49 -07:00
1345326656 another measure for the attending to nothing issue 0.0.47 lucidrains 2025-10-20 10:32:31 -07:00
55574c054e assert 0.0.46 lucidrains 2025-10-19 09:59:42 -07:00
27ed6d0ba5 fix time kv cache 0.0.45 lucidrains 2025-10-19 09:16:06 -07:00
4930002e99 bit of progress on time kv cache 0.0.44 lucidrains 2025-10-19 09:04:26 -07:00