Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes request headers lowercase in English locale. #29419

Closed
wants to merge 2 commits into from
Closed

Makes request headers lowercase in English locale. #29419

wants to merge 2 commits into from

Conversation

adityasrini
Copy link

@adityasrini adityasrini commented Apr 7, 2018

Converts requestHeader names to lowercase in the English Locale. Makes overriding default headers in RestClient possible. Converted the requestNames.contains... == false statement to !requestNames.contains.... Closes #22623

@adityasrini adityasrini changed the title Makes request headers lowercase in English locale. Closes #22623 Makes request headers lowercase in English locale. Apr 7, 2018
@adityasrini
Copy link
Author

Wait why isn’t @elasticmachine here?!

@jasontedor
Copy link
Member

We have disabled the auto-pings from @elasticmachine; due to having multiple PR CI jobs that can be triggered by @elasticmachine, it was leading to multiple pings from the bot reminding us on community PRs. These days, we are all in the right mindset on how to handle community PRs so we felt the value of the pings has diminished over time, and removing them was better than subjecting everyone to multiple pings.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment. This does not constitute a full review.

}
for (Header defaultHeader : defaultHeaders) {
if (requestNames.contains(defaultHeader.getName()) == false) {
if (!requestNames.contains(defaultHeader.getName().toLowerCase(Locale.ENGLISH))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it or not, our style is to use == false instead of !.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll change it back then!

@javanna javanna added the :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch label Apr 9, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@javanna javanna added the >bug label Apr 9, 2018
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @adityasrini thanks for your PR. Could you please add a test that fails against current master, and is successful with your fix?

@adityasrini
Copy link
Author

@javanna On it!

}
for (Header defaultHeader : defaultHeaders) {
if (requestNames.contains(defaultHeader.getName()) == false) {
if (requestNames.contains(defaultHeader.getName().toLowerCase(Locale.ENGLISH)) == false) {
httpRequest.addHeader(defaultHeader);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityasrini
Rather than doing the two String.toLowerCase(Locale.ENGLISH) which requires 2 changes, you should replace the new HashMap() with a new TreeMap(String.CASE_INSENSITIVE_ORDER).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree on this. We don't need a sorted map here, we just need to make sure that keys are case insensitive, which is one of the properties of treemap when used in this way, but we don't need all of its other properties which also affect that data structure that's internally used to store entries. Makes sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna

Sorry my suggestion should have been to replace the new HashSet() with new TreeSet(String.CASE_INSENSITIVE_ORDER) at line 433 , and remove the 2x toLowerCase(Locale.ENGLISH) additions, obviously mentioning TreeMap was nonsense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree with this either, we would be introducing a sorted data structure where we don't need ordering, but only case-insensitive lookups. Not sure it is a good trade-off.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna

we would be introducing a sorted data structure where we don't need ordering, but only case-insensitive lookups.

The internal structure of treeset doesnt matter, any more than the internals of hashset matter.

Hashsets also sort/arrange their keys, it might not be alphabetically, but the entire buckets thing also has its own system sorting system when it allocates keys into a chain of buckets, but who cares.

All that matters is the *set allows us to determine if some key already exists in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in this case it won't make a huge difference, but HashSet is backed by a HashMap while TreeSet is back by a TreeMap. TreeMap is implemented using a red-black tree, which is very different compared to HashMap, the reason being that the former will have to re-balance the tree to make it possible to iterate through entries in their natural ordering. We are using a set here though just to quickly check if it already contains something, and we never iterate through it. Conceptually, I still like more the two calls to lowercase than using a red-black tree just because treeset allows to make strings case-insensitive. Hopefully I explained what I meant. It would be interesting to measure which of the two solutions is faster, not sure it's worth it in this case given that this is probably not a hotspot.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeMap is implemented using a red-black tree, which is very different compared to HashMap,

Today TreeMap is a red black tree, tomorrow it might change, it doesnt matter, all we care about is that it keeps the contract that allows case insensitive look ups of keys. We can be sure they are close enough to the same speed, the jdk collection guys know this important they arent going to make one 10x slower than the other.

A HashMap turns into a tree when it gets really full anyway, which basicaly means the jdk guys dont care why should you ?

https://stackoverflow.com/questions/30164087/how-does-java-8s-hashmap-degenerate-to-balanced-trees-when-many-keys-have-the-s

The implementation notes comment in HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]

which is very different compared to HashMap, the reason being that the former will have to re-balance the tree to make it possible to iterate through entries in their natural ordering

And a hashmap also has to rebalance when it hits some threshold. To solve that we simply call the ctor that takes initial size so in both case the rebalance never happens.

We are using a set here though just to quickly check if it already contains something, and we never iterate through it.

Thats right so why talk about something that never happens.

It would be interesting to measure which of the two solutions is faster, not sure it's worth it in this case given that this is probably not a hotspot

Except that enough of this is, because HM and TM are used everywhere and will be compiled by hotspot. If one is compiled so will be the other and vice versa.

Why ? This path takes 100s/100s of cycles vs billions for a complete request, it doesnt matter. Even if one is 2x or 10x sloewr the total request time will be basically the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping a hash set. Moving to a tree set would make operations perform in O(log(size)) rather than constant time. Even if this set would usually be small so that it wouldn't matter, we have seen in the past that users are sometimes very creative when it comes to pushing the system to its boundaries.

@rjernst
Copy link
Member

rjernst commented Apr 16, 2018

If there is any lower casing done here, it should be with the root locale, not English.

@adityasrini
Copy link
Author

Hi all, I really apologize for not having got back to this in a bit! I’ve been swamped by another database/API related project! Rest assured that I haven’t abandoned this PR and will be working on test cases this weekend!

@adityasrini
Copy link
Author

adityasrini commented May 19, 2018

Hey everyone I'm finding an error with most tests in org.elasticsearch.client.RestClientTests class. For some reason the ResponseListener.onSuccess() is not called when the restClient.performRequestAsync() method is run. I found this out when I ran the testPerformAsyncWithUnsupportedMethod() with GET as an argument in the restClient.performRequestAsync(... method and got a never ending test. I did the same for a few others to make sure that they had successful HTTP responses when they weren't supposed to have them, and no exceptions were thrown. Even though there was code to explicitly throw exceptions - fail()! I even wrote a custom test to throw a RuntimeException() in an onSuccess implementation and the test still passed!

However when I call responseListener.onSuccess(null); explicitly in a try...catch block of performRequestAsync(), tests seem to fail or pass as intended. Which means that for some reason along the call stack, it is not being called. Perhaps this should be raised as an issue? I hope I was able to convey my point. If not, I can make a commit/gist to show what I mean.

@javanna
Copy link
Member

javanna commented May 22, 2018

hi @adityasrini thanks for raising this, I opened #30776 to remove endless waits and improve RestClientTests. That said it is expected that onSuccess is never called on that test, given that the mock of the http client that we are using doesn't do anything, hence it cannot return successful http responses. If you need to test http requests you should rather look at RestClientSingleHostIntegTests, there are already tests for headers in there.

@adityasrini
Copy link
Author

@javanna Thanks for raising that issue! I'll use the other integration test class.

@javanna
Copy link
Member

javanna commented Jul 16, 2018

given that we have decided (see #30616) to remove support for default headers, as they are already supported by the apache http client which we use to perform requests, I am going to close this PR. Thanks anyway @adityasrini !

@javanna javanna closed this Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants