Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove bzip2, gzip, and none from supported compression types #33442

Closed
wants to merge 1 commit into from

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Sep 28, 2023

Problem

These extra compression types are inferior for us and have caused us extra support burden (especially none). Trent mentioned this the other day with general support in Slack, and waking up to several people asking for support with none was the straw that broke the camel's back for me.

Summary of Changes

Rip out archive format types that we do not use. This change does retain the ArchiveFormat::Tar such that we can continue to skip compression for unit tests. However, "tar" has been removed from the possible values array which means passing "--snapshot-archive-format tar" will error out.

@steviez steviez force-pushed the rm_bad_snap_archive_fmt branch 2 times, most recently from 01c32a7 to bd3d1fa Compare September 28, 2023 12:46
These extra compression types are inferior for us and have caused us
extra support burden (especially none).

This change does retain the ArchiveFormat::Tar such that we can continue
to skip compression for unit tests. However, "tar" has been removed from
the possible values array which means passing "--snapshot-archive-format
tar" will error out.
ArchiveFormat::TarGzip,
ArchiveFormat::TarBzip2,
ArchiveFormat::TarLz4,
ArchiveFormat::Tar, // `solana-test-validator` creates uncompressed snapshots
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not true - test-validator uses the default value in SnapshotConfig which is ArchiveFormat::TarZstd.

rpc/src/rpc_service.rs Show resolved Hide resolved
runtime/src/snapshot_utils/archive_format.rs Show resolved Hide resolved
Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's a slight risk of ecosystem trauma here, as we're rugging some snapshot forms entirely. A more gentle approach would seem to be:

  1. Remove the ability to create bz2, gz, and none snapshots in v1.17. Fortunately the 1.17 train is about to leave the station since we just hit supermajority on 1.16!
  2. Once 1.17 forks off master, land the rest of this PR.

But also, I think some of the initial ledger snapshots from single digit epochs were in .gz format. If so then I feel we want to retain the ability to load gzip snapshots forever. I'm unsure if we ever used bzip2 snapshots in the ledger archives. Safest might be to retain the ability to load both bzip2 and gzip snapshots forever..

@steviez
Copy link
Contributor Author

steviez commented Sep 28, 2023

So there's a slight risk of ecosystem trauma here, as we're rugging some snapshot forms entirely.

That's a fair point. It also occurred to me that someone using --snapshot-compression none would get a hard error at startup if this PR shipped as-is.

  • One one hand, it would be gentler to detect "none" (and the other soon-to-be-deprecated formats) and then set "zstd" + emit a warning.
  • Then again, lots of the reports we've seen with --snapshot-compression none have seemed to be copy/paste startup scripts with other footgun args. So, I went with hard-error in the hopes of catching folks attention. But, maybe gentler would be better on this aspect as well

Remove the ability to create bz2, gz, and none snapshots in v1.17. Fortunately the 1.17 train is about to leave the station since we just hit supermajority on 1.16!

That seems reasonable, and wouldn't be too hard to do. As-is, I believe the logic that finds snapshots to load from searches the snapshot directory and determines the type based on filename. The CLI specified format only influences the format for snapshots that weill be created

Once 1.17 forks off master, land the rest of this PR.

👍

@steviez
Copy link
Contributor Author

steviez commented Sep 28, 2023

@brooksprumo - You were going to be a reviewer on this anyways; does the plan mvines laid out above seem reasonable to you? That is, stop the ability to create gzip/bzip2/non-compressed snapshots in v1.17, and cut them from v1.18 completely?

@codecov
Copy link

codecov bot commented Sep 28, 2023

Codecov Report

Merging #33442 (2ede3e2) into master (fa168e3) will decrease coverage by 0.1%.
Report is 6 commits behind head on master.
The diff coverage is 85.7%.

@@            Coverage Diff            @@
##           master   #33442     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         798      798             
  Lines      217100   217049     -51     
=========================================
- Hits       177967   177915     -52     
- Misses      39133    39134      +1     

@brooksprumo brooksprumo self-requested a review September 28, 2023 15:10
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brooksprumo - You were going to be a reviewer on this anyways; does the plan mvines laid out above seem reasonable to you? That is, stop the ability to create gzip/bzip2/non-compressed snapshots in v1.17, and cut them from v1.18 completely?

Yeah, I think changing the cli args for solana-validator and solana-ledger-tool should be the only changes for v1.17.

For v1.18+, I think we may want to keep the ability to read the other snapshot formats indefinitely... Or at least two versions. So maybe v1.19 can remove some more stuff. (Basically I want to ensure master can always open any snapshot generated by the stable version.)

Thanks for working on this, though! Sorry for any wasted effort though.

@steviez
Copy link
Contributor Author

steviez commented Sep 29, 2023

But also, I think some of the initial ledger snapshots from single digit epochs were in .gz format. If so then I feel we want to retain the ability to load gzip snapshots forever. I'm unsure if we ever used bzip2 snapshots in the ledger archives. Safest might be to retain the ability to load both bzip2 and gzip snapshots forever..

Regardless of compression format, are we certain that current code (tip of master) could open a snapshot from a single digit epoch? I feel like I remember there being some breaking changes along the way. I guess I could try this out pretty quickly with ledger-tool

@brooksprumo
Copy link
Contributor

brooksprumo commented Sep 29, 2023

Regardless of compression format, are we certain that current code (tip of master) could open a snapshot from a single digit epoch?

I would not be surprised if old snapshots require an older version of ledger-tool/etc to open them. It's very easy to break compatibility, based on the way we handle snapshot versioning.

@mvines
Copy link
Member

mvines commented Sep 29, 2023

Regardless of compression format, are we certain that current code (tip of master) could open a snapshot from a single digit epoch?

I would not be surprised if old snapshots require an older version of ledger-tool/etc to open them. It's very easy to break compatibility, based on the way we handle snapshot versioning.

Yeah you're probably very right. That ship has likely already sailed given we don't really have explicit unit tests for legacy snapshots

@steviez
Copy link
Contributor Author

steviez commented Sep 29, 2023

I was curious so I downloaded some snapshots from the Google Bucket and tried to open with solana-ledger-tool:

// Slot 104612 (.bz2)
[2023-09-29T21:02:11.264444629Z WARN  solana_runtime::snapshot_utils] Snapshot Error: "unsupported snapshot version: 1.0.0"
[2023-09-29T21:02:11.264591629Z ERROR solana_ledger::bank_forks_utils] Failed to load bank from snapshot archives: I/O error: unsupported snapshot version: 1.0.0

// Slot 19872075 (.bz2)
[2023-09-29T21:08:02.607092780Z WARN  solana_runtime::snapshot_utils] Snapshot Error: "unsupported snapshot version: 1.1.0"
thread 'solUnpkSnpsht01' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 2, kind: NotFound, message: "No such file or directory" })', runtime/src/snapshot_utils.rs:1899:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[2023-09-29T21:08:02.608292203Z ERROR solana_ledger::bank_forks_utils] Failed to load bank from snapshot archives: I/O error: unsupported snapshot version: 1.1.0

// Slot 29808033 (.bz2)
thread 'main' panicked at 'Can't change frozen bank by adding not-existing new builtin program (solana_bpf_loader_program, BPFLoader2111111111111111111111111111111111). Maybe, inconsistent program activation is detected on snapshot restore?', runtime/src/bank.rs:3247:9

// Slot 34991671 (.bz2)
thread 'main' panicked at 'Can't change frozen bank by adding not-existing new builtin program (solana_bpf_loader_program, BPFLoader2111111111111111111111111111111111). Maybe, inconsistent program activation is detected on snapshot restore?', runtime/src/bank.rs:3247:9

// Slot 35424025 (.zstd) - first directory on mainnet-beta-ledger-us-ny5 that uses zstd
thread 'main' panicked at 'Can't change frozen bank by adding not-existing new builtin program (solana_bpf_loader_program, BPFLoader2111111111111111111111111111111111). Maybe, inconsistent program activation is detected on snapshot restore?', runtime/src/bank.rs:3247:9

So, we seemingly cannot open any .bz2 snapshot with current ledger-tool. And I didn't see any .gz in the bucket. Based on this, I'd be inclined to say that ripping out bz2 and gz is fair game in master (unless we intend to patch things up in master to restore compat with old snapshot versions - would be cool but doubt we'd consider this a priority)

@steviez
Copy link
Contributor Author

steviez commented Oct 2, 2023

Given Brook's comment above,

For v1.18+, I think we may want to keep the ability to read the other snapshot formats indefinitely... Or at least two versions. So maybe v1.19 can remove some more stuff. (Basically I want to ensure master can always open any snapshot generated by the stable version.)

I'm going to close this PR for now in favor of #33484 which is along the lines of the gentler approach that mvines mentioned.

@steviez steviez closed this Oct 2, 2023
@steviez steviez deleted the rm_bad_snap_archive_fmt branch October 18, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants