Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug with ElasticSearchQueryBuilder, Column Name are not preserving case sensivity #12791

Closed
wants to merge 1 commit into from

Conversation

bjet007
Copy link

@bjet007 bjet007 commented May 10, 2019

Hi, I was testing presto agains an ElasticSearch cluster for a internal project at my company and after having an hard time to retrieve data, I found an issue in the ElasticSearchQueryBuilder.

The FetchSource field in the SearchQueryBuilder object must preserve the case the field, otherwise the SearchHit.getSourceAsMap() function won't contains any values.

@facebook-github-bot
Copy link
Collaborator

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@tdcmeehan
Copy link
Contributor

CC: @zhenxiao

@facebook-github-bot
Copy link
Collaborator

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bjet007 thanks for working on it
could you please add testcases with case sensitive fields? BTW, maybe you could take a look at the toLowerCase() call, and see whether they are used appropriate

@bjet007
Copy link
Author

bjet007 commented May 11, 2019

I did try to add a test before submitting my pull request, but I was not to sure where to add it. The presto-elasticsearch module doesn't have a lot of test in it. Is there a specific module that is use to test query agains data connetor?

@zhenxiao
Copy link
Collaborator

you could take a look at:
TestElasticsearchIntegrationSmokeTest

Or, add a new testcase under:
presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch

BTW, for the fix, could you please take a look at the toLowerCase() in:
ElasticsearchTableHandle and ElasticsearchUtils

@bjet007
Copy link
Author

bjet007 commented May 15, 2019

I've added a test case in TestElasticsearchIntegrationSmokeTest.

For the toLowerCase() method, I believe the one used for table name in ElasticsearchTableHandle are ok as index name in elasticsearch must be lower cased. For the one in the ElasticSearchUtils, I'm not too sure, it seem to work, but I don't have a deep knowledge of presto and the impact of removing it.

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, @bjet007
The fix is good. I left some comments for testcases.
As this is simple, how about merge the 3 commits into 1?

{
String indexName = "machine";

Map<String, Object> docAsSource = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/docAsSource/source/g

assertEquals(actualResult, expectedColumns, format("%s != %s", actualResult, expectedColumns));
}

private void addElasticDocumentToIndex(String indexName, Map<String, Object> docAsSource)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we inline this function?

addElasticDocumentToIndex(indexName, docAsSource);
embeddedElasticsearchNode.getClient().admin().indices().refresh(refreshRequest(indexName)).actionGet();

MaterializedResult actualColumns = computeActual("SELECT name, ProductionCount FROM telemetry.Machine").toTestTypes();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define a string for "telemetry.Machine"
BTW, I think we could have a better table name and field name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for a better names, but I had no idea. Do you have any suggestion?

@zhenxiao
Copy link
Collaborator

Seems the root reason is, ColumnMetadata is lower case column name:

58     public ColumnMetadata(String name, Type type, boolean nullable, String comment, String extraInfo, boolean hidden, Map<String, Object> properties)
59     {
60         checkNotEmpty(name, "name");
61         requireNonNull(type, "type is null");
62         requireNonNull(properties, "properties is null");
63 
64         this.name = name.toLowerCase(ENGLISH);
65         this.type = type;
66         this.comment = comment;
67         this.extraInfo = extraInfo; 

I am OK with this fix. But not sure whether we should support case sensitivity in column names.
@nezihyigitbasi what do you think?

@bjet007 bjet007 force-pushed the master branch 2 times, most recently from e2fd16f to d0ee712 Compare May 28, 2019 14:24
@zhenxiao
Copy link
Collaborator

@bjet007 looks good to me. So the Travis failures are not related to ur changes, right?
Let's wait for @nezihyigitbasi 's idea of whether we should support case sensitivity for Elasticsearch connector column. As ColumnMetadata is lowercase all column names.

@bjet007
Copy link
Author

bjet007 commented May 29, 2019 via email

…eld are not preserving ElasticSearch field case sensitivity.
@emhlbmc
Copy link
Contributor

emhlbmc commented May 31, 2019

Hi @zhenxiao I meet this issue too, but I think building elasticsearch query requests with "ColumnJsonPath" is not a good idea, though it works fine now.(And I bulit my own version by this way, too). It's better to support case sensitivity in further days.

@zhenxiao
Copy link
Collaborator

thanks @Lunatictwo
Yep, using JsonPath seems not an idea way, while, I could not have a better idea of supporting column case sensitivity for Elasticsearch Connector, as column name already lower cased by ColumnMetadata. Do you have any idea?

Or, we could explicitly saying column sensitivity is not supported in Elasticsearch Connextor. waiting for @nezihyigitbasi to comment

@nezihyigitbasi
Copy link
Contributor

But not sure whether we should support case sensitivity in column names.

@zhenxiao The Presto engine currently doesn't support case sensitive identifiers, we have a long standing issue for that: #2863. So, since the engine doesn't support case sensitivity, the Elasticsearch connector also doesn't support that.

Using getColumnJsonPath instead of getColumnName to work around this limitation is hacky and confusing. Therefore, I recommend making a change on the user side to not use case insensitive column names.

@bjet007
Copy link
Author

bjet007 commented May 31, 2019

@nezihyigitbasi I agree that using the getColumnJsonPath instead of getColumnName is not the ideal solution, but i don't understand your recommendation. What do you mean by user side, rename all field in elasticsearch to be lower case?

As it's stand today, the current code doest not work at all if your ElasticSearch contain fields with upper cased values, because the query that is pushed down to the cluster specify requested columns all in lower case. And this is different that the filter that are correctly pushed down. The result of ElasticSearch is a query that contains the right amount of rows, but with no column (empty json map).

I don't know which one is the worst, but from a user point of view, asking me to rename all fields in my ElasticSearch cluster to be able to use presto is a no go.

@nezihyigitbasi
Copy link
Contributor

What do you mean by user side, rename all field in elasticsearch to be lower case?

The problem of case sensitivity came up many times with other connectors too, and unfortunately since Presto engine doesn't support case sensitivity we always had to ask users to rename their fields. I totally understand that this is not ideal for the users, but at the same time changing a particular connector in a hacky way to work around this limitation of the engine is also not the way to go.

@bjet007
Copy link
Author

bjet007 commented Jun 3, 2019

I probably misunderstood the meaning of getColumnJsonPath, but the only thing I did is replicate the same behavior as line 134 of ElasticsearchQueryBuilder. So to me it's just fixing an inconsistency in the query builder.

For your position in asking user to modify their storage to fit prestodb, I don't think it's the right approach for a software that is a middle layer between multiple databases and user, but I haven't deeply use presto or read all discussion that lead to that position.

@zhenxiao
Copy link
Collaborator

Hi @nezihyigitbasi as ElasticsearchQueryBuilder is already using JsonPath to construct Elasticsearch request, I am inclined to accept this PR, so that column case sensitivity could luckily be supported for Elasticsearch. What do you think?

@stale
Copy link

stale bot commented Dec 8, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label Dec 8, 2019
@stale stale bot closed this Dec 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants