Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer #4938

Merged
merged 30 commits into from
Oct 26, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Aug 9, 2023

Implement a new struct Layer abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new struct Layer provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this RemoteTimelineClient::wait_completion which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value Drop.

With a single struct Layer the closed open ended trait Layer, trait PersistentLayer and struct RemoteLayer are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839.

The new struct Layer is a preliminary to remove Timeline::layer_removal_cs documented in #4745.

Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

2340 tests run: 2225 passed, 0 failed, 115 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_pageserver_with_empty_tenants[real_s3]: debug

Postgres 15

  • test_competing_branchings_from_loading_race_to_ok_or_err: release

Code coverage (full report)

  • functions: 53.4% (8603 of 16101 functions)
  • lines: 80.7% (49900 of 61806 lines)

The comment gets automatically updated with the latest test results
6778c54 at 2023-10-26T09:09:59.921Z :recycle:

@koivunej
Copy link
Member Author

koivunej commented Aug 9, 2023

Summarizing discussion with @jcsp and @LizardWizzard regarding "gc soft-deletes" (private slack link):

  • a separate delete queue for Move to use batch operations to delete files from S3 #4325
    • identified issues with local-only (missing from index_part.json) which need to be dealt with
  • soft delete and removal from index_part.json issued from gc/compaction
  • real local delete + schedule remote delete issued from LayerE::drop
    • this maintains "you have Arc you can read it"
  • compaction uploads of new files need to keep old files not deleted until upload completes, uses barriers, but how do barriers work with lazier delete?
    • this requirement could be handled with guards like ResidentDeltaLayer
    • in compaction case each deletion keeping a Arc<[ProhibitRemoteDelete]> could work
  • uploading "local-only" files need a guard like ResidentDeltaLayer
    • this requires a different kind of guard which just prohibits eviction but does not init the inner yet? Maybe move the OnceCell a bit lower, around LayerKind?

Related: #4326

@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch 2 times, most recently from 6307a36 to 6fa0be8 Compare August 16, 2023 14:11
@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch from cb77f57 to b54a9b3 Compare August 16, 2023 18:21
@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch from b54a9b3 to 9a96033 Compare August 16, 2023 20:38
@koivunej
Copy link
Member Author

I will be rebasing and collecting "inspired" work to the top to split it off, but any comments are of course welcome, esp. if they relate to LayerE internals, which I am quite happy with.

@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch from 9a96033 to 1b3a04e Compare August 16, 2023 20:42
koivunej added a commit that referenced this pull request Aug 17, 2023
Restores #4937 work relating to the ability to use `ResidentDeltaLayer`
(which is an Arc wrapper) in #4938 for the ValueRef's by removing the
borrow from `ValueRef` and providing it from an upper layer.

This should not have any functional changes, most importantly, the
`main` will continue to use the borrowed `DeltaLayerInner`. It might be
that I can change #4938 to be like this. If that is so, I'll gladly rip
out the `Ref` and move the borrow back. But I'll first want to look at
the current test failures.
pageserver/src/tenant/storage_layer/delta_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
koivunej added a commit that referenced this pull request Aug 17, 2023
I will have to change these as I change remote_timeline_client api in
#4938. So a bit of cleanup, handle my comments which were just resolved
during initial review.

Cleanup:
- use unwrap in tests instead of mixed `?` and `unwrap`
- use `Handle` instead of `&'static Reactor` to make the
RemoteTimelineClient more natural
- use arrays in tests
- use plain `#[tokio::test]`
@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch 4 times, most recently from b4a8642 to 374a537 Compare August 21, 2023 06:49
@koivunej
Copy link
Member Author

koivunej commented Aug 21, 2023

All of the test failures should now be linked to metrics. Upon closer look no, it's related to layermap init and how I added "faster access" to the internals for eviction, fixing. This still leaves the metric problems.

Test report

1620 tests run: 1524 passed, 21 failed, 75 skipped (full report)

Failures on Posgres 15

  • test_statvfs_pressure_usage: release, debug

  • test_statvfs_pressure_min_avail_bytes: debug, release

  • test_gc_of_remote_layers: release, debug

  • test_delete_tenant_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-mock_s3-timeline-delete-before-index-delete-True]: debug

  • test_delete_timeline_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-mock_s3-timeline-delete-before-index-deleted-at]: release, debug

  • test_delete_timeline_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-real_s3-timeline-delete-before-index-deleted-at]: release, debug

Failures on Posgres 14

  • test_fully_custom_config: debug

  • test_statvfs_pressure_usage: release, debug

  • test_statvfs_pressure_min_avail_bytes: release, debug

  • test_gc_of_remote_layers: release, debug

  • test_delete_timeline_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-real_s3-timeline-delete-before-index-deleted-at]: release, debug

  • test_delete_timeline_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-mock_s3-timeline-delete-before-index-deleted-at]: debug

@koivunej
Copy link
Member Author

Oki, now with the latest force push we are finally at main after #5649.

Of course it will be difficult for you to see what I did. I ended up renaming the method added in #5645 the RemoteTimelineClient::schedule_unlinking_from_index_part as schedule_gc_update, which is much more fitting in c2e0a12.

@koivunej
Copy link
Member Author

Failures on Posgres 14

  • test_ondemand_download_large_rel[local_fs]: release

This now keeps failing because we never actually stop the remote timeline client, and deletes happen late.

Fixing this is rather involved, I think we need a new queue operation which we would call from

client.wait_completion().await
like, wait_for_completion_and_stop. If anything is even needed because of the #4960. I think I'll just leave it not done, and we'll need to remember this if some flakyness arises.

it was originally in the Layer::finish_creating, however this would had
created a new metrics bug in case create_image_layers or
compact_level0_phase1 would fail not on the first layer but any
following.

this solution does lose on the speed metrics are updated, but that is as
has been.
sadly, no test for this yet.
I don't see how else get any insight what'll happen in staging.
@koivunej koivunej force-pushed the remove_remote_layer_ongoing branch from 370af00 to 36ba3ba Compare October 26, 2023 01:48
@koivunej
Copy link
Member Author

Adding these FIXME's:

pageserver/src/disk_usage_eviction_task.rs:349:          // FIXME: batching makes no sense anymore because of no layermap locking, should just spawn
pageserver/src/tenant/timeline.rs:744:                   // FIXME: the match should only cover repartitioning, not the next steps

Intend to change these with #5331.

pageserver/src/tenant/storage_layer/image_layer.rs:523:  // FIXME: why not carry the virtualfile here, it supports renaming?

We have other virtual file related fixmes. During this PR, I think virtualfile learned how to support this use case as well, so it might be something interesting to look into.

pageserver/src/tenant/timeline.rs:3167:                  // FIXME: should spawn_blocking the rest of this function
pageserver/src/tenant/timeline.rs:3417:                  // FIXME: spawn_blocking above for this

This is not trivial to implement because now the DeltaEntry reads are "async". I was thinking of "async" feeder to spawn_blocking task. This might be good. Latter is about needing to fix the par_fsync.

pageserver/src/tenant/timeline.rs:3109:                  // FIXME: downloading while holding layer_removal_cs is not great, but we will remove that

#5108.

pageserver/src/tenant/timeline.rs:2473:                  // FIXME: even though we have a single image and single delta layer assumption
pageserver/src/tenant/timeline.rs:2867:                  // FIXME: we could add the images to be uploaded *before* returning from here, but right
pageserver/src/tenant/timeline.rs:2632:                  // FIXME: the writer already fsyncs all data, only rename needs to be fsynced here
pageserver/src/tenant/timeline.rs:3408:                  // FIXME: the writer already fsyncs all data, only rename needs to be fsynced here

These are more misc observations. The structuring is not great with regards to how new layers are added.

@koivunej koivunej merged commit c508d3b into main Oct 26, 2023
34 checks passed
@koivunej koivunej deleted the remove_remote_layer_ongoing branch October 26, 2023 09:36
koivunej added a commit that referenced this pull request Oct 26, 2023
this was the plan, but I forgot it from #4938.
koivunej added a commit that referenced this pull request Oct 27, 2023
#5649 added the concept of dangling layers which #4938 uses but only
partially. I forgot to change `schedule_compaction_update` to not
schedule deletions to uphold the "have a layer, you can read it".

With the now remembered fix, I don't think these checks should ever fail
except for a mistake I already did. These changes might be useful for
protecting future changes, even though the Layer carrying the generation
AND the `schedule_(gc|compaction)_update` require strong arcs.

Rationale for keeping the `#[cfg(feature = "testing")]` is worsening any
leak situation which might come up.
koivunej added a commit that referenced this pull request Nov 2, 2023
With the layer implementation as was done in #4938, it is possible via
cancellation to cause two concurrent downloads on the same path, due to
how `RemoteTimelineClient::download_remote_layer` does tempfiles. Thread
the init semaphore through the spawned task of downloading to make this
impossible to happen.
koivunej added a commit that referenced this pull request Nov 2, 2023
Some of the log messages were lost with the #4938. This PR adds some of
them back, most notably:

- starting to on-demand download
- successful completion of on-demand download
- ability to see when there were many waiters for the layer download
- "unexpectedly on-demand downloading ..." is now `info!`

Additionally some rare events are logged as error, which should never
happen.
koivunej added a commit that referenced this pull request Nov 21, 2023
While reviewing code noticed a scary `layer_paths.pop().unwrap()` then
realized this should be further asyncified, something I forgot to do
when I switched the `compact_level0_phase1` back to async in #4938.

This keeps the double-fsync for new deltas as #4749 is still unsolved.
koivunej added a commit that referenced this pull request Nov 28, 2023
Quest: #4745. Follow-up to
#4938.

- add in locks for compaction and gc, so we don't have multiple
executions at the same time in tests
- remove layer_removal_cs
- remove waiting for uploads in eviction/gc/compaction
    - #4938 will keep the file resident until upload completes

Co-authored-by: Christian Schwarz <christian@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants