Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: don't squash all image layer generation errors into anyhow::Error #7943

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jun 3, 2024

Problem

CreateImageLayersError and CompactionError had proper From implementations, but compact_legacy was explicitly squashing all image layer errors into an anyhow::Error anyway.

This led to errors like:

 Error processing HTTP request: InternalServerError(timeline shutting down

Stack backtrace:
   0: <<anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from as core::ops::function::FnOnce<(pageserver::tenant::timeline::CreateImageLayersError,)>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   1: <core::result::Result<alloc::vec::Vec<pageserver::tenant::storage_layer::layer::ResidentLayer>, pageserver::tenant::timeline::CreateImageLayersError>>::map_err::<anyhow::Error, <anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from>
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:829:27
   2: <pageserver::tenant::timeline::Timeline>::compact_legacy::{closure#0}
             at pageserver/src/tenant/timeline/compaction.rs:125:36
   3: <pageserver::tenant::timeline::Timeline>::compact::{closure#0}
             at pageserver/src/tenant/timeline.rs:1719:84
   4: pageserver::http::routes::timeline_checkpoint_handler::{closure#0}::{closure#0}

Closes: #7861

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp changed the title Jcsp/issue 7861 compaction errs pageserver: don't squash all image layer generation errors into anyhow::Error Jun 3, 2024
@jcsp jcsp force-pushed the jcsp/issue-7861-compaction-errs branch from 2605108 to 15b0d16 Compare June 3, 2024 12:43
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Jun 3, 2024
Copy link

github-actions bot commented Jun 3, 2024

3156 tests run: 3017 passed, 0 failed, 139 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_pageserver_restarts_under_worload: release

Postgres 14

  • test_pageserver_restarts_under_worload: release

Code coverage* (full report)

  • functions: 31.4% (6530 of 20798 functions)
  • lines: 48.3% (50421 of 104318 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
15b0d16 at 2024-06-03T13:30:20.053Z :recycle:

@jcsp jcsp marked this pull request as ready for review June 3, 2024 15:13
@jcsp jcsp requested a review from a team as a code owner June 3, 2024 15:13
@jcsp jcsp requested a review from koivunej June 3, 2024 15:13
@jcsp jcsp enabled auto-merge (squash) June 3, 2024 16:24
@jcsp jcsp requested a review from arpad-m June 3, 2024 17:30
@jcsp jcsp merged commit 11bb265 into main Jun 3, 2024
70 of 71 checks passed
@jcsp jcsp deleted the jcsp/issue-7861-compaction-errs branch June 3, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_timeline_deletion_with_files_stuck_in_upload_queue is flaky
2 participants