Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java high-level REST client completeness #27205

Closed
76 of 80 tasks
javanna opened this issue Nov 1, 2017 · 72 comments
Closed
76 of 80 tasks

Java high-level REST client completeness #27205

javanna opened this issue Nov 1, 2017 · 72 comments

Comments

@javanna
Copy link
Member

javanna commented Nov 1, 2017

This is a meta issue to track completeness of the Java REST high-level Client in terms of supported API. The following list includes all the REST API that Elasticsearch exposes to date, and that are also exposed by the Transport Client. The ones marked as done are already supported by the high-level REST client, while the others need to be added. Every group is sorted based on an estimation around how important the API is, from more important to less important. Each API is also assigned a rank (easy, medium, hard) that expresses how difficult adding support for it is expected to be.

The API listed as "Not Required" won't need to be supported before the transport client is removed from the master branch (next major version). Such API are mainly administrative API that are not likely to be used from a Java application. They generally return heavy responses and make it hard to reuse response objects from the transport client as they expose internal objects that in some cases cannot even be parsed back entirely based on the information returned at REST. We considered returning those as maps of maps but that’s also easy to achieve using the low-level REST client hence we decided to not implement them for the time being.

Top-level APIs

Indices API

Not required

  • shard stores (medium)
  • upgrade (easy) (to be removed?)
  • upgrade status (easy) (to be removed?)
  • segments (hard) (exposes ShardRouting)
  • recoveries (hard) (exposes ShardRouting, DiscoveryNode)
  • indices stats (hard) (exposes ShardRouting and a lot of other objects)

Snapshot API

Ingest API

Tasks API

Cluster API

Not required

  • search shards (medium) (exposes ShardRouting, DiscoveryNode and requires parsing back QueryBuilder)
  • pending cluster tasks (easy)
  • allocation explain (hard) (exposes ShardRouting)
  • cluster state (hard) (exposes ClusterState)
  • reroute (easy if done after cluster state API, returns the entire cluster state)
  • nodes info (hard) (exposes DiscoveryNode and a lot of other objects)
  • nodes stats (hard) (exposes ShardRouting and a lot of other objects)
  • cluster stats (hard) (exposes DiscoveryNode, requires nodes info and nodes stats)
  • hot threads (easy) (exposes DiscoveryNode)
  • nodes usage (medium) (exposes DiscoveryNode)

REST only API

There are a number of API that are exposed via REST but not via the Transport Client. They don't necessarily have to be implemented if the goal is feature parity with the Transport Client, yet we should probably have a look at why they were not added to the Transport Client and whether it makes sense to add their support to the high-level REST Client or not. I don't think it makes sense to add support for cat API and ingest processor grok, hence I took them out already.

How to add support for a new API

Look at some of the already supported API and existing PRs that have been merged:

The common tasks in each of the above PRs are:

  • add fromXContent method to existing response class currently used by transport client and corresponding unit tests that make use of fields shuffling as well as random fields insertion (in order to test forward compatibility). That usually means adding a test for the response object that extends AbstractXContentTestCase where supportsUnknownFields() returns true as well as assertToXContentEquivalence. There are cases where we can't insert random fields everywhere, which then require to also override the getRandomFieldsExcludeFilter() method which returns path that should be excluded when injecting random fields. Given the randomizations applied, it makes sense to run this type of test locally with -Dtests.iters=50 argument just to make sure that it is consistently green.
  • add new method to Request class which translates the input request into the internal REST request representation that holds method, url, endpoint, params etc. and add corresponding tests to RequestTests
  • add new method to RestHighLevelClient, possibly also its async variant when it makes sense, we may not want to add async variants to every single method, so we decide case by case. The name of the new method must match what is defined in our REST spec including the namespace.
  • add integration test that extends ESRestHighLevelClientTestCase that tests the new method end-to-end by sending REST requests to an external cluster.
  • add docs page. To check how docs are rendered and whether the links between docs pages and docs snippets work ok, run the following command from the root of your local checkout of the Elasticsearch repository: /path/to/elastic/docs/build_docs.pl --doc docs/java-rest/index.asciidoc --chunk 1 --out ~/temp/asciidoc --open . This requires also a local checkout of the docs repository, where the perl script is located.

Relates #29827

@javanna
Copy link
Member Author

javanna commented Nov 7, 2017

I updated the description of the issue by assigning each API a rank from 1 to 3 based on how difficult it should be to add support for it to the high-level REST client. Criterias were mainly how big the request is to serialize and how big the response is to parse back.

@clintongormley
Copy link
Contributor

thanks @javanna - I've separated the APIs into "important" and "optional" lists, where optional APIs are ones that will seldom be used from applications other than monitoring applications or tests. If anybody disagrees with my selection, feel free to mention which APIs should be marked as important.

@slovdahl
Copy link

slovdahl commented Nov 9, 2017

Not using the high-level REST client yet, but I would really have expected that multi-get was supported.

@hariso
Copy link
Contributor

hariso commented Nov 10, 2017

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

@nik9000
Copy link
Member

nik9000 commented Nov 10, 2017

@javanna This might be a bloody stupid question, but: In which way does someone pick up an API and starts working on it? Without risking that someone did the same.: )

You add a comment here saying you are working on it.

@hariso
Copy link
Contributor

hariso commented Nov 10, 2017

Thanks @nik9000 !

@catalin-ursachi
Copy link
Contributor

I've picked up Create Index.

@hariso
Copy link
Contributor

hariso commented Nov 14, 2017

I have picked up " indices exist".

@hariso
Copy link
Contributor

hariso commented Nov 14, 2017

For questions related to code (how to run a test, which tests are (not) needed, do we need the async version of a method etc.) do I ask here, in a separate issue or the forums? Or something else? : )

@javanna
Copy link
Member Author

javanna commented Nov 14, 2017

hi @hariso it depends on the question :) Probably better to open a PR even though it is work in progress, so we can discuss your questions there. Would that work for you?

@hariso
Copy link
Contributor

hariso commented Nov 14, 2017

It definitely would. Thanks for the answer!

@markharwood
Copy link
Contributor

markharwood commented Nov 2, 2018

Perhaps another general "java convention" to consider @hub-cap.

How do we map potentially long-running wait_for_completion=false style REST APIs to our notion of sync and async Java calls?

I hit this trying to find a long-running task that could be used in my getTask tests. It looks like HLRC's reindex has been written without any support for returning task IDs. This means reindex and getTask can't be practically used together in HLRC. Reindex needs to find a way to offer more of the async features.
In discussions with @pgomulka we came up with this candidate general convention for mapping async REST apis to Java:

  • Foo syncFoo(...) and void asyncFoo(..., listener) would map to REST calls without wait_for_completion params (the majority of our existing APIs)
  • FooTask submitFooTask(...) would map to the REST equivalents with wait_for_completion set to false. It's a synchronous call to a REST api with async features.

Does this make sense? It probably applies to more than reindex

hub-cap pushed a commit that referenced this issue Nov 2, 2018
Add `count()` api method, `CountRequest` and `CountResponse` classes to HLRC. Code in server module is unchanged.

Relates to #27205
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 2, 2018
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskID and results
can be queried with Task API

refers: elastic#27205
hub-cap pushed a commit that referenced this issue Nov 2, 2018
Add `count()` api method, `CountRequest` and `CountResponse` classes to HLRC. Code in server module is unchanged.

Relates to #27205
@hub-cap
Copy link
Contributor

hub-cap commented Nov 5, 2018

@markharwood I think as mentioned above, I dont think throwing exceptions is the way to go, so I dont think we should be removing it from ignores. Just to reiterate the work I saw in your other review, #35166, I think the use of Optional works well here.

Also, I agree with @pgomulka and your assessment of sync/async/submit, :shipit:

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Nov 5, 2018
pgomulka added a commit that referenced this issue Nov 8, 2018
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: #27205
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 13, 2018
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: elastic#27205
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Nov 14, 2018
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: elastic#27205
pgomulka added a commit that referenced this issue Nov 14, 2018
Extend High Level Rest Client Reindex API to support requests with
wait_for_completion=false. This method will return a TaskSubmissionResult with task identifier as string and results can be queried with Task API

refers: #27205
Original PR against master #35202 
Back Port PR #35527
mayya-sharipova added a commit that referenced this issue Nov 19, 2018
mayya-sharipova added a commit that referenced this issue Nov 19, 2018
dnhatn added a commit that referenced this issue Dec 7, 2018
dnhatn added a commit that referenced this issue Dec 7, 2018
@utkarsh4G

This comment has been minimized.

@tvernum
Copy link
Contributor

tvernum commented Dec 11, 2018

@utkarsh4G Could you please ask this question on our discuss forum
Elastic uses GitHub issues for tracking work that needs to be undertaken, such as bugs and feature requests, and we use the forums for questions such as yours.

@hub-cap
Copy link
Contributor

hub-cap commented Oct 7, 2019

Closing in favor of #47679

@hub-cap hub-cap closed this as completed Oct 7, 2019
@hub-cap
Copy link
Contributor

hub-cap commented Oct 7, 2019

Just kidding, closing in favor of #47678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests