Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website documentation recommending GET with body to do search is invalid according to http 1.1 spec #16024

Closed
et304383 opened this issue Jan 15, 2016 · 11 comments
Labels
:Core/Infra/REST API REST infrastructure and utilities discuss

Comments

@et304383
Copy link

I'm not sure where else to report this...

All throughout the website, you give examples of how to perform various operations related to search but the examples are GET requests with a data body.

This is wrong according to the Http 1.1 spec that dictates the body of a GET request should not change the response contents. In other words, you should be able to ignore the request body and get the same result.

As an alternative, you should document the query string parameters better. For example, it's not documented how to do highlighting with query string parameters, only a GET payload.

Ultimately, calling GET with a payload should be deprecated and removed from future releases of ES. This is not recommended by anyone and I can find no support for doing things this way.

@jasontedor jasontedor added discuss :Core/Infra/REST API REST infrastructure and utilities labels Jan 15, 2016
@clintongormley
Copy link
Contributor

This has been raised many times.

This is wrong according to the Http 1.1 spec that dictates the body of a GET request should not change the response contents. In other words, you should be able to ignore the request body and get the same result.

The RFC for HTTP 1.1 RFC2616 says this:

A server SHOULD
read and forward a message-body on any request; if the request method
does not include defined semantics for an entity-body, then the
message-body SHOULD be ignored when handling the request.

So yes, the recommendation (note: "SHOULD") is that a GET body shouldn't have any effect on server processing.

But GET is a better semantic fit for search than POST:

In particular, the convention has been established that the GET and
HEAD methods SHOULD NOT have the significance of taking an action
other than retrieval. These methods ought to be considered "safe".
This allows user agents to represent other methods, such as POST, PUT
and DELETE, in a special way, so that the user is made aware of the
fact that a possibly unsafe action is being requested.

Also mentioned in the spec:

A cache or origin server receiving a conditional request, other than
a full-body GET request, MUST use the strong comparison function to
evaluate the condition.

Users often allow just GET requests to their servers in order to allow a range of non-destructive actions. The problem is that a search request (with the query DSL, aggs, etc) can easily overflow the max URL length, so we allow sending it in the body instead. Of course, some clients don't allow GET requests with a body, so we support two other ways of sending a search request:

  • GET _search?source=......
  • POST _search

As an alternative, you should document the query string parameters better. For example, it's not documented how to do highlighting with query string parameters, only a GET payload.

The Query DSL has so many options, restricting yourself to just the basic query string parameters would be a waste of time. For instance, you wouldn't be able to use aggregations.

If you don't like using GET _search with a body, then use POST instead, but we're not planning on changing this.

@et304383
Copy link
Author

The "source" parameter needs to be documented. We only discovered this on some google search on one forum mentioned by someone way down the line as a suggestion. I can't even find that link right now it was so obscure and difficult to find.

Once we discovered the source parameter existed and could accept our json payload this was quite easier to resolve.

Regardless of limitations on query string parameters, how to send all options as as query string parameters should be documented, not just a select few.

Using a POST to do a search "feels" wrong and would be unacceptable regardless since we'd then have to open our Elasticsearch endpoint (Amazon ES) up to allow post to the public and would thus require additional logic to prevent destructive operations as you mentioned earlier such as additional authentication (not sure we can do this in Amazon ES) or a proxy server.

TLDR: more documentation is never a bad thing.

Thanks!

@clintongormley
Copy link
Contributor

The "source" parameter needs to be documented.

It is documented under API conventions > Common options, along with other request parameters that apply across all or most APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_request_body_in_query_string

Once we discovered the source parameter existed and could accept our json payload this was quite easier to resolve.

You are very likely to run into problems with having your search request truncated - it is too easy to run into server URL limits.

how to send all options as as query string parameters should be documented, not just a select few.

I'm not 100% certain, but I don't believe that highlighting is supported via query string parameters (other than encoding the body in source).

The query DSL (plus the rest of the search options) is just too rich to replicate purely with query string params.

@et304383
Copy link
Author

I understand where you're coming from. Thanks for that link but that definitely needs to be placed somewhere in the SEARCH API docs. Google isn't able to index properly when one searches for "elasticsearch uri search source" or similar searches because the content isn't falling on one page.

You may or may not care, but the majority of people don't have time to read all documentation for a piece of software - they have to rely on Google to index it properly so they can find exactly what they need quickly.

At the very least, the source parameter should be listed here (please):
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html

Edit: please understand that we've placed our Elasticsearch service behind API Gateway (may or may not be the best decision) and it does NOT support a GET request with a request body. It is outright refused and the request is flagged as invalid. That's why our hands are somewhat tied without redesigning / rebuilding our infrastructure.

dadoonet referenced this issue in elastic/elasticsearch-definitive-guide Jan 26, 2016
@asbjornu
Copy link

@clintongormley, I completely agree with @eric-tucker. So does Roy Fielding:

Server semantics for GET, however, are restricted such that a body, if any, has no semantic meaning to the request. The requirements on parsing are separate from the requirements on method semantics.

So, yes, you can send a body with GET, and no, it is never useful to do so.

This is part of the layered design of HTTP/1.1 that will become clear again once the spec is partitioned (work in progress).

....Roy

I think it is a clear violation of the Layered System constraint of REST, as well as section 4.3 of RFC 2616:

[...] if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.

On top of that, section 4.3.1 of RFC 7231 states:

A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.

The response to a GET request is cacheable; a cache MAY use it to satisfy subsequent GET and HEAD requests unless otherwise indicated by the Cache-Control header field (Section 5.2 of [RFC7234]).

So I don't think there's anything to argue; this is not a recommended practice. If I put a Varnish server in front of Elasticsearch, I should expect it to cache the response of all GET requests sharing the same Vary semantics. With Elasticsearch's use of a message body, that will at best lead to different GET requests being fulfilled by the same cached response, and at worst, break badly.

I would actually rather use the drafted SEARCH method than GET for these requests. The method is already standardized in WebDAV (RFC 5323) and I don't think @reschke and @jasnell's draft requires too much work to be published. They could perhaps weigh in with their thoughts?

@et304383
Copy link
Author

This is a very good point regarding cache of GETs in Varnish. You've now made a GET behave differently based on a payload but anything obeying RFC standards is completely valid in ignoring that payload and treating two (supposedly) different GETs as the same request.

Just because this is the way ES was setup and you don't want to change it doesn't mean it's correct and doesn't need to change long term.

@asbjornu
Copy link

asbjornu commented Aug 8, 2016

@clintongormley If @jasnell finishes his Internet Draft of the SEARCH method, would you consider adding support for it and in time phase out the current "GET with HTTP body" practice?

@jasnell
Copy link

jasnell commented Aug 8, 2016

I will look at getting this moved forward. I had left it in the hands of a couple others but it looks like the interest / momentum faded out. If there's interest in implementing this, I can definitely work on getting it advanced.

@glennblock
Copy link

+1

@clintongormley
Copy link
Contributor

@asbjornu sure - makes sense

@piotr-dobrogost
Copy link

@jasnell

I will look at getting this moved forward. I had left it in the hands of a couple others but it looks like the interest / momentum faded out. If there's interest in implementing this, I can definitely work on getting it advanced.

What's the status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/REST API REST infrastructure and utilities discuss
Projects
None yet
Development

No branches or pull requests

7 participants