Document bulk behavior with multiple clients/documents #1666

pquentin · 2023-02-07T17:38:20Z

As suggested by @dliappis in elastic/rally-tracks#378 (comment).

dliappis

Thanks for this adding this. Basically LGTM, left a few clarification questions.

docs/track.rst

dliappis · 2023-02-08T07:59:45Z

docs/track.rst

@@ -758,6 +758,12 @@ Properties
 * ``detailed-results`` (optional, defaults to ``false``): Records more detailed meta-data for bulk requests. As it analyzes the corresponding bulk response in more detail, this might incur additional overhead which can skew measurement results. See the section below for the meta-data that are returned. This property must be set to ``true`` for individual bulk request failures to be logged by Rally.
 * ``timeout`` (optional, defaults to ``1m``): Defines the `time period that Elasticsearch will wait per action <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk-api-query-params>`_ until it has finished processing the following operations: automatic index creation, dynamic mapping updates, waiting for active shards.

+With multiple ``clients``, Rally will split each document from ``corpora`` using as many splits as there are ``clients``. This ensures that the bulk index operations are efficiently parallelized but has the drawback that the ingestion is not done in the order of each document. For example, if ``clients`` is set to 2, one client will index the document starting from the beginning, while the other will index starting from the middle.
+
+Additionally, if there are multiple documents or corpora, Rally will try to index all documents in parallel in two ways:


if there are multiple documents or corpora

Isn't this the same thing?

A track can only have one corpora section but I guess you aren't referring to the section here, but rather the plural of corpus, essentially >1 files with documents for Rally to index?

It's not the same thing. You can have multiple corpora, and multiple documents in each corpora. You can see this clearly when looking at the JSON structure of say geonames:

"corpora": [ { "name": "geonames", "base-url": "https://rally-tracks.elastic.co/geonames", "documents": [ { "source-file": "documents-2.json.bz2", "document-count": 11396503, "compressed-bytes": 265208777, "uncompressed-bytes": 3547613828 } ] } ],

I know the elastic/logs track takes advantage of it: https://github.com/elastic/rally-tracks/blob/7df06b34daa50fe4d583d4bfbaea9336ff8cd278/elastic/logs/track.json#L394-L642.

So maybe I should update the doc you linked to as well!

You can have multiple corpora, and multiple documents in each corpora.

🤦 ofc I totally forgot about that. I think we are good with the docs as you have them then.

dliappis

This is LGTM from me, based on the latest discussion. Further improvements welcome, but it already is an incremental improvements from the current docs and effectively describe this hard to grasp concept.

Document bulk behavior with multiple clients/documents

542722f

pquentin added the :Docs Changes to the documentation label Feb 7, 2023

pquentin added this to the 2.7.1 milestone Feb 7, 2023

pquentin requested a review from dliappis February 7, 2023 17:38

pquentin self-assigned this Feb 7, 2023

pquentin mentioned this pull request Feb 7, 2023

Allow indexing data in order with multiple indexing clients #1650

Closed

dliappis reviewed Feb 8, 2023

View reviewed changes

dliappis self-requested a review February 9, 2023 12:49

dliappis approved these changes Feb 9, 2023

View reviewed changes

Update docs/track.rst

a310785

pquentin merged commit 590dad1 into elastic:master Feb 13, 2023

pquentin deleted the document-bulk-splits branch February 13, 2023 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document bulk behavior with multiple clients/documents #1666

Document bulk behavior with multiple clients/documents #1666

pquentin commented Feb 7, 2023

dliappis left a comment •

edited

Loading

dliappis Feb 8, 2023

pquentin Feb 9, 2023

dliappis Feb 9, 2023

dliappis left a comment

Document bulk behavior with multiple clients/documents #1666

Document bulk behavior with multiple clients/documents #1666

Conversation

pquentin commented Feb 7, 2023

dliappis left a comment • edited Loading

Choose a reason for hiding this comment

dliappis Feb 8, 2023

Choose a reason for hiding this comment

pquentin Feb 9, 2023

Choose a reason for hiding this comment

dliappis Feb 9, 2023

Choose a reason for hiding this comment

dliappis left a comment

Choose a reason for hiding this comment

dliappis left a comment •

edited

Loading