-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Bug with ElasticSearchQueryBuilder, Column Name are not preserving case sensivity #12791
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
CC: @zhenxiao |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bjet007 thanks for working on it
could you please add testcases with case sensitive fields? BTW, maybe you could take a look at the toLowerCase() call, and see whether they are used appropriate
I did try to add a test before submitting my pull request, but I was not to sure where to add it. The presto-elasticsearch module doesn't have a lot of test in it. Is there a specific module that is use to test query agains data connetor? |
you could take a look at: Or, add a new testcase under: BTW, for the fix, could you please take a look at the toLowerCase() in: |
I've added a test case in TestElasticsearchIntegrationSmokeTest. For the toLowerCase() method, I believe the one used for table name in ElasticsearchTableHandle are ok as index name in elasticsearch must be lower cased. For the one in the ElasticSearchUtils, I'm not too sure, it seem to work, but I don't have a deep knowledge of presto and the impact of removing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, @bjet007
The fix is good. I left some comments for testcases.
As this is simple, how about merge the 3 commits into 1?
{ | ||
String indexName = "machine"; | ||
|
||
Map<String, Object> docAsSource = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/docAsSource/source/g
assertEquals(actualResult, expectedColumns, format("%s != %s", actualResult, expectedColumns)); | ||
} | ||
|
||
private void addElasticDocumentToIndex(String indexName, Map<String, Object> docAsSource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we inline this function?
addElasticDocumentToIndex(indexName, docAsSource); | ||
embeddedElasticsearchNode.getClient().admin().indices().refresh(refreshRequest(indexName)).actionGet(); | ||
|
||
MaterializedResult actualColumns = computeActual("SELECT name, ProductionCount FROM telemetry.Machine").toTestTypes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define a string for "telemetry.Machine"
BTW, I think we could have a better table name and field name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for a better names, but I had no idea. Do you have any suggestion?
Seems the root reason is, ColumnMetadata is lower case column name:
I am OK with this fix. But not sure whether we should support case sensitivity in column names. |
e2fd16f
to
d0ee712
Compare
@bjet007 looks good to me. So the Travis failures are not related to ur changes, right? |
@zhenxiao <https://github.com/zhenxiao>, no It's my latest commit that had
a checkstyle error, sorry about that.
…On Wed, May 29, 2019 at 2:32 AM Zhenxiao Luo ***@***.***> wrote:
@bjet007 <https://github.com/bjet007> looks good to me. So the Travis
failures are not related to ur changes, right?
Let's wait for @nezihyigitbasi <https://github.com/nezihyigitbasi> 's
idea of whether we should support case sensitivity for Elasticsearch
connector column. As ColumnMetadata is lowercase all column names.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12791?email_source=notifications&email_token=AEB5ZDREOGPMVQ7B3XUREJTPXYPQTA5CNFSM4HMGH6JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWOKAOQ#issuecomment-496803898>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEB5ZDQRT7OMJRFZF2YSROTPXYPQTANCNFSM4HMGH6JA>
.
--
Benoit Jean
|
…eld are not preserving ElasticSearch field case sensitivity.
Hi @zhenxiao I meet this issue too, but I think building elasticsearch query requests with "ColumnJsonPath" is not a good idea, though it works fine now.(And I bulit my own version by this way, too). It's better to support case sensitivity in further days. |
thanks @Lunatictwo Or, we could explicitly saying column sensitivity is not supported in Elasticsearch Connextor. waiting for @nezihyigitbasi to comment |
@zhenxiao The Presto engine currently doesn't support case sensitive identifiers, we have a long standing issue for that: #2863. So, since the engine doesn't support case sensitivity, the Elasticsearch connector also doesn't support that. Using |
@nezihyigitbasi I agree that using the getColumnJsonPath instead of getColumnName is not the ideal solution, but i don't understand your recommendation. What do you mean by user side, rename all field in elasticsearch to be lower case? As it's stand today, the current code doest not work at all if your ElasticSearch contain fields with upper cased values, because the query that is pushed down to the cluster specify requested columns all in lower case. And this is different that the filter that are correctly pushed down. The result of ElasticSearch is a query that contains the right amount of rows, but with no column (empty json map). I don't know which one is the worst, but from a user point of view, asking me to rename all fields in my ElasticSearch cluster to be able to use presto is a no go. |
The problem of case sensitivity came up many times with other connectors too, and unfortunately since Presto engine doesn't support case sensitivity we always had to ask users to rename their fields. I totally understand that this is not ideal for the users, but at the same time changing a particular connector in a hacky way to work around this limitation of the engine is also not the way to go. |
I probably misunderstood the meaning of getColumnJsonPath, but the only thing I did is replicate the same behavior as line 134 of ElasticsearchQueryBuilder. So to me it's just fixing an inconsistency in the query builder. For your position in asking user to modify their storage to fit prestodb, I don't think it's the right approach for a software that is a middle layer between multiple databases and user, but I haven't deeply use presto or read all discussion that lead to that position. |
Hi @nezihyigitbasi as ElasticsearchQueryBuilder is already using JsonPath to construct Elasticsearch request, I am inclined to accept this PR, so that column case sensitivity could luckily be supported for Elasticsearch. What do you think? |
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions! |
Hi, I was testing presto agains an ElasticSearch cluster for a internal project at my company and after having an hard time to retrieve data, I found an issue in the ElasticSearchQueryBuilder.
The FetchSource field in the SearchQueryBuilder object must preserve the case the field, otherwise the SearchHit.getSourceAsMap() function won't contains any values.