Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new compaction: Fix failure in staircasing test #7244

Closed
Tracked by #7554
jcsp opened this issue Mar 26, 2024 · 3 comments · Fixed by #7282
Closed
Tracked by #7554

new compaction: Fix failure in staircasing test #7244

jcsp opened this issue Mar 26, 2024 · 3 comments · Fixed by #7282
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug

Comments

@jcsp
Copy link
Collaborator

jcsp commented Mar 26, 2024

No description provided.

@jcsp jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Mar 26, 2024
@jcsp jcsp changed the title Fix failure in staircasing test new compaction: Fix failure in staircasing test Mar 26, 2024
@arpad-m
Copy link
Member

arpad-m commented Mar 26, 2024

slack link

Error msg:

ERROR layer_delete{tenant_id=61e66dc7e243450d90a8ed5624ad9d26 shard_id=0000 timeline_id=df0ea8a3bad847e88802a4b6a809c533}: 000000000000000000000000000000000000-000000067F00000001000004E70000000016__000000000DBC0841 was unlinked but was not dangling

@koivunej
Copy link
Member

@arpad-m
Copy link
Member

arpad-m commented Mar 29, 2024

I've investigated this today.

  • The affected files are all image layers
  • Before the mentioned error message, the files produce log lines like:
INFO request{method=PUT path=/v1/tenant/b4ed2fb5d2020a52930b09b5fdefb8e4/timeline/3a3a25d547a6b9a801621ff43c090713/do_gc request_id=678ea518-ebb7-477c-a9f9-fdd0d9e145d4}:manual_gc{tenant_id=b4ed2fb5d2020a52930b09b5fdefb8e4 shard_id=0000 timeline_id=3a3a25d547a6b9a801621ff43c090713}:gc_timeline{timeline_id=3a3a25d547a6b9a801621ff43c090713 cutoff=0/21853860}: Deleting layer 000000067F00000001000004E70000000016-000000067F0000000100000A280300000000__000000000DBBCAE1 not found in latest_files list, never uploaded?
  • Those unlinked errors show up, timestamp wise, before the Physical storage size log entry (for step=2), so most of the steps are not needed to reproduce this issue (giving shorter iteration time)

arpad-m added a commit that referenced this issue Apr 3, 2024
Tiered compaction hasn't scheduled the upload of image layers. In the
`test_gc_feedback.py` test this has caused warnings like with tiered
compaction:

```
INFO request[...] Deleting layer [...] not found in latest_files list, never uploaded?
```

Which caused errors like:

```
ERROR layer_delete[...] was unlinked but was not dangling
```

Fixes #7244
arpad-m added a commit that referenced this issue May 15, 2024
Adds a test that is a reproducer for many tiered compaction bugs,
both ones that have since been fixed as well as still unfxied ones:
* (now fixed) #7296 
* #7707 
* #7759
* Likely also #7244 but I haven't tried that.

The key ordering bug can be reproduced by switching to
`merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting
a big part of #7661, although it only sometimes reproduces (30-50% of
cases).

part of #7554
a-masterov pushed a commit that referenced this issue May 20, 2024
Adds a test that is a reproducer for many tiered compaction bugs,
both ones that have since been fixed as well as still unfxied ones:
* (now fixed) #7296 
* #7707 
* #7759
* Likely also #7244 but I haven't tried that.

The key ordering bug can be reproduced by switching to
`merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting
a big part of #7661, although it only sometimes reproduces (30-50% of
cases).

part of #7554
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants