-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kill segments by versions #15994
Kill segments by versions #15994
Conversation
...xing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java
Outdated
Show resolved
Hide resolved
Kill tasks by default kill all versions of unused segments in the specified interval. Users wanting to delete specific versions (for example, data compliance reasons) and keep rest of the versions can specify the optional version in the kill task payload.
5c22821
to
b49d7f9
Compare
Sort of like method-level parameterized tests.
...vice/src/test/java/org/apache/druid/indexing/test/TestIndexerMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. On the whole, I would prefer that we override the methods where possible to add a new flavour that accepts versions
. Otherwise, we have to pass a lot of nulls around which is a little confusing.
...ice/src/main/java/org/apache/druid/indexing/common/actions/RetrieveUnusedSegmentsAction.java
Outdated
Show resolved
Hide resolved
...xing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java
Outdated
Show resolved
Hide resolved
...xing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java
Outdated
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/druid/indexing/common/actions/RetrieveUsedSegmentsAction.java
Outdated
Show resolved
Hide resolved
...xing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
Retain the old interface method and make it default and route it to the method with nullable versions variant. Update usages to use the default method where versions doesn't matter.
d364931
to
4341955
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of final comments, otherwise all looks good.
Haven't gone through the tests in detail. But should we also add a test to verify that we don't kill a segment whose load spec is currently being used by some other version segment?
...ice/src/main/java/org/apache/druid/indexing/common/actions/RetrieveUnusedSegmentsAction.java
Outdated
Show resolved
Hide resolved
...vice/src/test/java/org/apache/druid/indexing/common/actions/RetrieveSegmentsActionsTest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/client/indexing/ClientKillUnusedSegmentsTaskQuery.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/client/indexing/ClientKillUnusedSegmentsTaskQuery.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
…sed load spec. Checkpoint.
final String versionsStr = versions.stream() | ||
.map(version -> "'" + version + "'") | ||
.collect(Collectors.joining(",")); | ||
sb.append(StringUtils.format(" AND version IN (%s)", versionsStr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we enforce some limit on the number of versions here? I think in practice most users wouldnt be specifying too many versions at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. However, we don't enforce an upper bound on some related parameters like limit
and batchSize
, so I'm unsure if we want to do it for versions
size. Also, the maximum size for IN
would also depend on the underlying metadata store itself, so my suggestion would be to roll this out without any size restriction right out of the bat, and revisit this later if needed
final DateTime maxUsedStatusLastUpdatedTime1 = DateTimes.nowUtc(); | ||
|
||
// Delay for 1s, mark the segments as unused and then capture the last updated time cutoff again | ||
Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually remove the sleep here and in the other existing tests by directly marking the segments as unused using the test connector. That way, the test would have control over the last updated time and we can set it to whatever time. I can clean up this pattern along with other miscellaneous testing stuff in a follow-up patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, @abhishekrb19 !
The test improvements can be done in a follow up.
Out of curiosity, do you plan to make similar changes for markUsed
/ markUnused
APIs too?
Intervals.of("2019/2020"), | ||
null, | ||
null, | ||
null | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems off when looking at the args in the previous lines.
final DateTime now = DateTimes.nowUtc(); | ||
final String v1 = now.toString(); | ||
final String v2 = now.minusHours(2).toString(); | ||
final String v3 = now.minusHours(3).toString(); | ||
|
||
final DataSegment segment1 = newSegment(Intervals.of("2019-01-01/2019-02-01"), v1, ImmutableMap.of("foo", "1")); | ||
final DataSegment segment2 = newSegment(Intervals.of("2019-02-01/2019-03-01"), v2, ImmutableMap.of("foo", "1")); | ||
final DataSegment segment3 = newSegment(Intervals.of("2019-03-01/2019-04-01"), v3, ImmutableMap.of("foo", "1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the tests seem to be using similar segments, versions, etc. Do you think some of this can go into the setup
method?
DATA_SOURCE, | ||
umbrellaInterval, | ||
null, | ||
null | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems off. Closing brace should have smaller indentation than the preceding args.
Thanks for the thorough reviews, @kfaraz!
Yes, I'm going to do similar changes to the markUsed / markUnused APIs next. I will also follow up on the test improvements in a separate patch. |
Summary:
Prior to this patch, kill tasks will delete all versions of unused segments within the specified interval. With this patch, users can now delete specific versions of unused data, while retaining the rest by specifying an optional list of
versions
in the kill task payload. If left unspecified, the default behavior remains unchanged, i.e., delete all versions of unused segments in the interval.Motivation:
Note that adding an optional list of
versions
support to/markAsUsed
and/markAsUnused
segment management APIs would be a complementary addition. I didn't make those changes in this PR to keep it simple to review - I will follow up on that later.Release note
Kill task accepts an optional list of unused segment
versions
to delete.Key changed/added classes in this PR
KillUnusedSegmentsTask.java
IndexerSQLMetadataStorageCoordinator.java
RetrieveUnusedSegmentsAction.java
KillUnusedSegmentsTaskTest.java
IndexerSQLMetadataStorageCoordinatorTest.java
RetrieveSegmentsActionsTest.java
This PR has: