Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the Pinot Grpc Endpoint for Streaming Server Queries #12332

Merged
merged 4 commits into from
Jun 22, 2022

Conversation

elonazoulay
Copy link
Member

@elonazoulay elonazoulay commented May 10, 2022

Description

Add support for querying Pinot via the more efficient GRPC endpoint.

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Pinot connector

Related issues, pull requests, and links

Documentation

(x) Documentation issue #12944 is filed, and can be handled later.

Release notes

(x) Release notes entries required with the following suggested text:

# Pinot
* Add support for querying Pinot via the gRPC endpoint. ({issue}`9296 `)

@cla-bot cla-bot bot added the cla-signed label May 10, 2022
@elonazoulay elonazoulay requested review from hashhar and ebyhr May 10, 2022 23:04
@ebyhr ebyhr removed their request for review May 10, 2022 23:57
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed, leaving some comments about commit boundaries as well (seems too fine-grained)

I'll need to look at the new PageSource in more detail

@hashhar
Copy link
Member

hashhar commented May 11, 2022

For the GRPC streaming is the only retained memory the one consumed by the byte buffers used for retrieving the Grpc responses? Can we account for them in the PageSource? Anything else that would need to be accounted (DataTable?)?

@elonazoulay
Copy link
Member Author

Reoardered the commits but had to keep Implement PinotDataFetcher separate as it depends on the Use maxPageSize in PinotSegmentPageSource commit. If anything I can squash those and move the 'Make PinotDataTableWithSize publicbefore theAdd Support for GRPC` commit.

re: memory size: the entire payload in bytes from the response is now added to the memory usage.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about commits but looks good overall.

@Praveen2112 Can you please take a look at last commit to see if the PageSource implementation looks correct (in terms of memory accounting, isFinished and getReadTimeNanos).

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for adding this support @elonazoulay

@@ -58,6 +60,9 @@
private int maxRowsForBrokerQueries = 50_000;
private boolean aggregationPushdownEnabled = true;
private boolean countDistinctPushdownEnabled = true;
private boolean grpcEnabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be enabled by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, less impact on the pinot servers and allows for larger result sets.

Copy link
Member Author

@elonazoulay elonazoulay May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinot servers will have to enable the endpoint in their configuration though, by default it's not enabled. Can add a note in the documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've turned on the grpc flag on pinot by default recently. I think we can make this enabled after one or two versions.

public Iterator<Server.ServerResponse> submitQuery(String query, String serverHost, List<String> segments)
{
HostAndPort mappedHostAndPort = pinotHostMapper.getServerGrpcHostAndPort(serverHost, grpcPort);
GrpcQueryClient client = clientCache.computeIfAbsent(mappedHostAndPort, k -> new GrpcQueryClient(mappedHostAndPort.getHost(), mappedHostAndPort.getPort()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda should use k. Also, give it a more meaninful name. At a minimum, key, but maybe hostAndPort would make it more obvious and readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who manages the lifecycle of these clients? Do they need to be torn down? Can they go into an "invalid" state (in which case, they'd need to be refreshed)?

Copy link
Member Author

@elonazoulay elonazoulay May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grpc clients are long lived and use internal threads to manage connections.
The default idle time is 30 minutes for the ManagedChannel to close idle connections.
I added life cycle management and a comment in the code.

  • DONE: Manage lifecycle - close grpc clients on shutdown
  • DONE: Separate configs and modules for grpc and legacy query clients
  • DONE: Remove unused configs (separate commit)
  • TODO: extract to separate commits

Comment on lines 76 to 77
PinotQueryClient pinotQueryClient,
PinotGrpcClient pinotGrpcClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the grpc setting is global to the server, Instead of passing both, it'd be better for these to implement a common interface and pass the appropriate one depending how the connector was initialized.

@elonazoulay elonazoulay force-pushed the pinot-grpc branch 2 times, most recently from 68a14b4 to b068fbb Compare May 31, 2022 06:09
public HostAndPort getServerGrpcHostAndPort(String serverHost, int grpcPort)
{
ServerInstance serverInstance = getServerInstance(serverHost);
return HostAndPort.fromParts(serverInstance.getHostname(), grpcPort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ServerInstance has an API to get grpcPort, so you don't need to pass it.
https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/transport/ServerInstance.java#L93

This API can just be: public HostAndPort getServerGrpcHostAndPort(String serverHost)

In this case, you don't even need to have the config for grpcPort, just make this best effort try. If the grpc port is -1, you return null here, and the query will use netty query endpoint. Once server has grpc, then the query will use grpc query endpoint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this connecting to a real cluster and the server instance returns -1 for the grpc port, it did not work. Is it ok to keep it explicitly specified? lmk what you think @xiangfu0 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean grpc is enabled but serverInstance gives grpc port -1? If that is the case, please keep this grpc port config as the backup and use the grpc port that > 0 if from the ServerInstance.
So the logic is best effort from ServerInstance if possible or use grpc port if not provided.

public class PinotGrpcServerQueryClientConfig
{
private int maxRowsPerSplitForSegmentQueries = Integer.MAX_VALUE - 1;
private int grpcPort = 8090;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this grpcPort, just get it directly if grpc is enabled.

@xiangfu0 xiangfu0 requested a review from martint June 4, 2022 02:59
@xiangfu0
Copy link
Contributor

xiangfu0 commented Jun 6, 2022

@martint can you review this again?

{
Map<String, String> metadata = dataTable.getMetadata();
List<String> exceptions = new ArrayList<>();
metadata.forEach((k, v) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use more descriptive names for k and v. It's not clear what they represent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, was from legacy code. Also thanks to @xiangfu0 for the commit!

Comment on lines 80 to 87
interface Factory
{
PinotDataFetcher create(ConnectorSession session, String query, PinotSplit split);
}

interface PinotServerQueryClient
{
Iterator<PinotDataTableWithSize> queryPinot(ConnectorSession session, String query, String serverHost, List<String> segments);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both a Factory and a PinotServerQueryClient interface? I'm not sure I understand the relationship between the two. Why do we need different implementations of the Factory, given that the query client is already abstracted to support both underlying protocols?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I think we just need factory no need for PinotServerQueryClient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Updating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, apologies, should have removed that before.

@elonazoulay
Copy link
Member Author

TODO: extract to separate commits, lmk if that will make it easier to review

@xiangfu0 xiangfu0 requested a review from martint June 16, 2022 18:45
@xiangfu0
Copy link
Contributor

I feel it's good to keep the PR here for a full complete feature. It's also simpler for future reference.

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up!

GrpcQueryClient create(HostAndPort hostAndPort);
}

private static void addIfNotNull(ImmutableMap.Builder<String, Object> propertiesBuilder, String key, Object value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer used

@martint martint merged commit ce0bef3 into trinodb:master Jun 22, 2022
@github-actions github-actions bot added this to the 387 milestone Jun 22, 2022
@xiangfu0
Copy link
Contributor

Thanks @martint @elonazoulay to get this PR in! 🥂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

[Pinot connector] Support gRPC connection to Pinot Broker
4 participants