Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for MSQ tasks to query realtime tasks #15024

Merged
merged 41 commits into from
Oct 9, 2023

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Sep 22, 2023

Currently, MSQ is unable to include realtime results from realtime queries.

This problem is present both in the controller, which only fetches metadata from the metadata store so that only published segments would be read and the worker, which reads segments from deep storage and lacks the ability to communicate with data servers.

This PR aims to add the capabilities to:

  • Fetch the realtime segment metadata from the coordinator server view,
  • Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.

Release Notes

  • Adds a context parameter includeSegmentSource which can be either NONE(default) or REALTIME. If this parameter is set to REALTIME, MSQ will include real time segments in the query results.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Sep 22, 2023
@adarshsanjeev adarshsanjeev marked this pull request as ready for review September 27, 2023 16:11
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the overall design. Left some comments.
Please add UT's for this.

Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since numRetriesOnMissingSegments is not added in the documentation, perhaps we should leave it out of the release notes as well.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes left are
BaseLeafFrameProcesor
DataServerClient
LoadedSegmentDataProvider
GroupingEngine

@Override
public ClientResponse<AppendableByteArrayInputStream> handleResponse(HttpResponse response, TrafficCop trafficCop)
{
log.debug("Received response status[%s] for queryId[%s]", response.getStatus(), queryId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for timeout?
Shoudnt traffic cop be used for backpressure?
We can do these things in a follow up PR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about metrics like bytes scanned and stuff ?

StandardRetryPolicy.noRetries()
);
this.objectMapper = objectMapper;
this.queryCancellationExecutor = Execs.scheduledSingleThreaded("query-cancellation-executor");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a pool for this. For each segment you are creating a new executor service which seems wasteful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who closes this executor service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to a common threadpool and added a wait to close it.

try {
statusSequencePair = RetryUtils.retry(
() -> {
Sequence<QueryType> sequence = dataServerClient.run(preparedQuery, responseContext, queryResultType, closer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen's if sequence is interrupted or closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be thrown outside this class, which would cause the query to fail

} else {
Boolean wasHandedOff = checkSegmentHandoff(coordinatorClient, dataSource, segmentDescriptor);
if (Boolean.TRUE.equals(wasHandedOff)) {
log.debug("Segment[%s] was handed off.", segmentDescriptor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log the counts in the stage somewhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add this as part of the counters

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
LGTM overall.

docs/multi-stage-query/reference.md Outdated Show resolved Hide resolved
statusSequencePair = RetryUtils.retry(
() -> {
ServiceLocation serviceLocation = Iterables.getOnlyElement(fixedSetServiceLocator.locate().get().getLocations());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use collectionUtils.getOnlyElement() so that a nicer error message is thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

*/
public enum SegmentSource
{
NONE(ImmutableSet.of()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIt: Lets add that None also includes deep storage segments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments here

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM.

@cryptoe cryptoe merged commit 7a35ce8 into apache:master Oct 9, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Release Notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants