Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill mistakenly stores blobs in the hot DB #5114

Closed
michaelsproul opened this issue Jan 23, 2024 · 4 comments
Closed

Backfill mistakenly stores blobs in the hot DB #5114

michaelsproul opened this issue Jan 23, 2024 · 4 comments
Labels
bug Something isn't working database deneb v4.6.0 ETA Q1 2024

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Jan 23, 2024

Description

Blobs stored during backfill are being stored in the hot DB. This is wrong, as they should be in the dedicated blobs_db (see: #4892). Prior to that they should have been in the freezer_db, so this seems to have been messed up for a while.

The incorrect code is:

// Store the blobs too
if let Some(blobs) = maybe_blobs {
new_oldest_blob_slot = Some(block.slot());
self.store
.blobs_as_kv_store_ops(&block_root, blobs, &mut hot_batch);
}

That batch is committed straight to the hot DB directly here:

self.store.hot_db.do_atomically(hot_batch)?;

There's no usage of do_atomically_with_block_and_blobs_cache, which would have handled the division of ops between the blobs DB and the hot DB.

Version

Lighthouse v4.6.0-rc.0

Steps to resolve

This is reasonably involved to fix, we need to:

  • Write the blobs to the blobs DB in import_historical_block_batch.
  • Think about concurrency and crash safety for the three different writes that need to happen in backfill. I think the safest order might be: blobs, hot, cold. Or we could use do_atomically_with_block_and_blobs_cache.
  • Increment the schema version to v19
  • Write a database migration for v18 -> v19 that copies blobs errantly stored in the hot DB to the blobs DB, and then deletes them from the hot DB.

The DB migration is required to fix up the databases of nodes on Goerli that were recently checkpoint synced and put the blobs in the wrong DB.

I think we should block the v4.6.0 release for Sepolia/Holesky until we at least have a fix for where the blocks are written. It's probably OK to do the schema migration later, as it will just affect some Goerli nodes. But the sooner we do it the better in some sense, as more data will accumulate.

@michaelsproul michaelsproul added bug Something isn't working database deneb labels Jan 23, 2024
@michaelsproul michaelsproul changed the title Backfilll blobs mistakenly stored in hot DB Backfilll mistakenly stores blobs in the hot DB Jan 23, 2024
@paulhauner paulhauner added the v4.6.0 ETA Q1 2024 label Jan 23, 2024
@michaelsproul
Copy link
Member Author

Indeed running on Goerli with checkpoint sync shows the blobs in the hot DB:

$ lighthouse db inspect --network goerli --column blb
Jan 23 06:30:21.823 INFO Running database manager for goerli network
022ef441b2853b176a82e3b98c46361f3a028381e35f98ec6d4b4784b444168a: 659640 bytes
08395843817fb3c426578fc6262af9e3996a60898daf3c3a23ffd54f384f0c31: 131928 bytes
083f699809407eda903f2b046e813480eca6b404e7c9e105d245b03322b1c00e: 395784 bytes
10f61acdeda287ace13a77bcbec10d19a5e4054b3b9fb9ec9c35a31bad76294d: 131928 bytes
11444491e41ac0d75ace3d454016b0f6dafd7d9b87aa8d272a83a651f9076549: 791568 bytes
188079b99dd73b035137b7957799d3fcfb24c34d20cc8fad5be5992cf1415917: 659640 bytes
19e798f4d30e7bceee373fa82eb247e184c310803c6b1f4fb4e3ad66bfbc47ec: 263856 bytes
1a52339549128630f92d9cfefa5c895befa010ca42e23eef609bdd023de6d804: 263856 bytes
1b9e4a8368d840c21cff0cdb8d515c336f520c982260f818785fc634e145b056: 263856 bytes
1d6a5fe39b1ca795e9cf6235ccc193fdda894e351f00e8ec45b71912b0c7dbc8: 263856 bytes
1ff2a83abad3af432a4b2073cafd89b7e21bf05e0d015ac5437efc3eabc18514: 131928 bytes
251300bfa84254f79935c91ea5dc52424feaaa317e414d2ac01598103edbe755: 659640 bytes
2f707bd0fd4309e4ec514b8ab63c051f66955e390c08438a275888fce4a8ce91: 791568 bytes
35e20178e29f2adcc683c7c120fa57506fa1fdc0be521b3e173f8a673f8ba642: 791568 bytes
37804acbcf1b00a2bfc2ec04374d5e0fc25b4710403d4c6f55b3b6d784348650: 791568 bytes
4348835cf8438d8a3e8528bf901ad15e162ef72da2a5746e698bf665b58165cf: 395784 bytes
43c7f22559c2945e18403c18b51d67551e48af32d9a5104a7f95521a788fe8ba: 791568 bytes
49e22cfca02cd9377a0cbe1e11022a81f3f267edf970585845680bc4ce07b44a: 659640 bytes
4f86adb374625f24fec26faf0959ceeaae9f7bd2fdf9c0e1729c5edfc89e1349: 659640 bytes
509dfe04ccf0323ca0dd3069592078cb594cc725760b06d0447c0e6a1854edeb: 791568 bytes
5281181e777571dd2e734838631113cdca1dfdd77f09a4e2d91be4bd4390a145: 395784 bytes
542f886fe4502e8d8bdd59214f876627f977fd711e7aadb4389fc106e830e87b: 791568 bytes
5bc1aea5f0f33fc2fdec8b0287d856ea617e6872993f09c30b49b9141d8f17c0: 527712 bytes
5d671efbdddc9d0ef19a77cd3c078ced11b6dbbc37663de1693374db0c501136: 791568 bytes
6858168beb80f8f9dbdd699a38c8d0453e164081769ab6149ff57befe3e047ef: 263856 bytes
6a8d2e5d12083d5ba3def03f9be37a45870de9e692176aae016edc1c2e02a60f: 395784 bytes
6d633430dd344534d41fbec41cea535bb920571c2fbc9576ba23745fc946e8bd: 791568 bytes
6dba26468ee5962b7fc26489a4fc6c250fe42fa28db83f1c60074e5ade085ac1: 791568 bytes
71a87ecf57855433904bce43bd95c9a4f80530d68dcb565c7a10668d9f381a62: 791568 bytes
7255311be0ed7e9bf287dcb08c138798cebcd4d96387fff697c1ef6c07c57f7d: 659640 bytes
7a00aebbdedb83596486e1a8d0f80a0c2dace672aede302b5dbafbbfc1b16181: 395784 bytes
7a67abe0c7722636d447335bb5557b6fd74dc8ecad76ecc1bba234e47ce17525: 791568 bytes
7abcbb1fe75d66631a14e4ba902b5aaa497fade1a17ec6391399624446b362df: 791568 bytes
7df05e98f9d0417de2c165d00a8392062b2f123962e4874e25e0af64d8a6d621: 131928 bytes
7f74c717ec86c202b8e5daec0c5d017e03f9de243bea3ea33a8bd2f52823f0a5: 131928 bytes
81f0d4ce180ca0054bdbeb3fa5ac87fcd30c26a2a1fa2fd9a6388c82db0f8a10: 791568 bytes
848f227e7051b814ef5c9ae8a231c335915224628f61daf585c161a35d231a37: 527712 bytes
85414feafa48103132e6262fd124828852745b4c9aca77c1d73fd796999dc334: 659640 bytes
85b893e150cb4878270bf2bd3d0c49487b5af4631a0c905b1b347c9516da54d7: 791568 bytes
8705fa291571089e7a35d219f243ddc8e29185076db0119619b7574148bde141: 527712 bytes
874cd36a4e718d86ccb3760b64c510418df7b44d663847c2975f896a32c63a8f: 395784 bytes
8a0390ec327aead003d410088cb547a13d68cbca135976503277f5b8817766eb: 791568 bytes
9363ed47e60065f81ba7bf31df5a9b349c055d7c41729eb9053a9c925253196e: 791568 bytes
94e47ae27be40b825672fbe7c1074607879ef757fc97e87123a7e9efe9ae4650: 791568 bytes
955bed7b577fdb25a2b27f7380b471c762efd4672a87dc7b4cae9ad852783749: 527712 bytes
96d23ce76001ec7c76f0e90f31c41cf1157097590e882e29c0113648ece955f9: 527712 bytes
975ca68414c754e9adf15bd8c6351cae43cf11ba4a899412e5c0b01c29774fa4: 791568 bytes
9765052e738ea8e729e7044db22b6248328c05226130a14bdd862fbaa00084e4: 791568 bytes
992860356f38658821931ef7c44c8ead2c41df0ef858aa2f97fb98d621a617a3: 527712 bytes
a82e6388425d4eb9a3b472424059a4c513c9d144d2f109cd0214c4b5d606c240: 395784 bytes
aa02d2be2e3a3008d5bcb3aa7704bf2197facb93e3ef069406f7125d4fe02d24: 659640 bytes
ac5d696bd733ed29b07d5fcd691204672d5d8166f16a4983e8cd163381ac402d: 791568 bytes
adb500e3413da8305b505d097cf693a3f2a157a1d9e5ff9ea1ccb917100d6d7b: 395784 bytes
b1faf48974e1d4c810197469105f4fe99eb7191f0fcc8636bc13627e8a14754b: 791568 bytes
b2180822cb5038bc3c4bc0a2a04f56c694536a6b2803fc5e666939ddff305a09: 791568 bytes
b7206011327146090090775fee04e27f04902a92fa2370723f5dbe1968d7bd73: 791568 bytes
b7c682ee200eb7501ba5ecf2a8143f6e5b33df6c7e239b282eed70daf9ae6a9d: 659640 bytes
b7da0e927a2583a22d1b2eaabc62ca09f1d71c7e193957d70b09b8e7057045ca: 527712 bytes
b9df571c2342d34a86a7f786c2b547603f8ed467f961383874ac93bf98982345: 791568 bytes
c18cddebc6da59337df5bc9cf6ea6deb8134bfc6a7d8a13ef3ec3b57a1b4822a: 791568 bytes
c20dd99dd61c4f59d606a6a818366df58e41766e4f2dd451ec55e30c7f2d1d8c: 395784 bytes
c27b66bdc1f23729515cec6a261ace5abe6b80c7ecd082eb5c44157cc81bda23: 791568 bytes
c3eb0ab2eaf9d39dd430b69f742cff59c79a3f64229a3642c66f98b1c8145648: 791568 bytes
c833556449e69e6c7fc124d22efebd36d31d45709eb9c7c4b4f5b6cd6f1c2cfb: 791568 bytes
cb934c7ffb5829f49bb7345268c2b9b92c0b784f27b50d1bb81f04e14c7bed49: 791568 bytes
d32a0f6715568b12d3f125595d3bb3a61c0fb156d2d1121d7e0f3ce3d10426b1: 791568 bytes
dc6d57d029fbfba4dec2069acb0e5558c7622824e2fb1787a10da2af2d18c6ca: 131928 bytes
ddc36fe6b6add943b574fd4849b47ae200c471e5eb8fc91d6bc2ede6be39847f: 395784 bytes
dfc750ebe1dcc9013210c296d078a1e31838ee3ea8ae5dc31094867ce3c2493f: 395784 bytes
e0cbafba445155669b78c4dd84e81a2fafcd6a7bfa3d52cd224eb3083eba70a3: 791568 bytes
e344aa5c65b1121c86bf2ad570d5fcecaeac8f353341cf00e3030d0cc553f0b1: 263856 bytes
e7bef105584b5d16f40de782fb0baeb7253c405092a29aabddbddb6d7e375184: 791568 bytes
e88c248d8ec1f576cad62c8ec751281763d820bf6cd48662ceafae437e48a3b4: 791568 bytes
ea75b07651bab6d6677d40f3dc28de462483671b4d260ab85d37162551afefee: 527712 bytes
ebb596213c7522b145c5ed7df8ac3768a8649fe3f39d57b648ad01c0d312d751: 791568 bytes
f16e69ef325b94b98b81ed606b759fed234e8da64118443991c9dbf34433a666: 131928 bytes
f1a40b803179865cb06ced96f0899dfbb403bd548bcd56243055316cae8a4d26: 395784 bytes
f29ac497aa4efa3e8c5d6af614404a33d0d4717b64f56b7d0d31b4d0719e90cd: 791568 bytes
f988553048a60100806e75e8258793c8b844128a81fba7400e2d6399095eb06b: 395784 bytes
fa5d0c457fd7976df9212c930e19bcfb12831f30a76956759895579ad8f90d87: 791568 bytes
ff822b71cc21ca883a26bad84000648256fd3ceae66a491fdf24d8a74fe29aca: 791568 bytes
Num keys: 81
Total: 47889864 bytes

@michaelsproul michaelsproul changed the title Backfilll mistakenly stores blobs in the hot DB Backfill mistakenly stores blobs in the hot DB Jan 23, 2024
@michaelsproul
Copy link
Member Author

API queries return empty lists of blobs:

$ curl -s "http://localhost:5052/eth/v1/beacon/blob_sidecars/0x11444491e41ac0d75ace3d454016b0f6dafd7d9b87aa8d272a83a651f9076549" | jq
{
  "data": []
}

@realbigsean
Copy link
Member

Think about concurrency and crash safety for the three different writes that need to happen in backfill. I think the safest order might be: blobs, hot, cold. Or we could use do_atomically_with_block_and_blobs_cache.

I went with the first option based on this thinking:

  • If we store blobs first, and end up with more blobs than blocks in the DB, I can only see this impacting how we serve blobs by range and blocks by range. But we already validate by range requests against BlobInfo, which is stored after. So we wouldn't even respond to a range request when we're in this state. For by roots we don't check BlobInfo but I don't see a response in this state as being worse than no response. Same sort of thing in the beacon API response.
  • do_atomically_with_block_and_blobs_cache is nice in that it makes the block+blob storage atomic. But the downsides of using it would be:
    • potential lock contention on the block cache (which is used in gossip block processing)
    • this would wash out the block cache with old blocks + blobs since this is an LRU cache
  • I can't see how making blocks + blob storage atomic is that beneficial

@michaelsproul
Copy link
Member Author

Fixed by #5119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database deneb v4.6.0 ETA Q1 2024
Projects
None yet
Development

No branches or pull requests

3 participants