Data partition algorithm failling with parallel indexing tasks #334
Labels
bug
Something's wrong
:Track Management
New operations, changes in the track format, track download changes and the like
Milestone
Rally version: esrally 0.7.2
Invoked command:
esrally --track=logs --offline --target-hosts=127.0.0.1:9200 --pipeline=benchmark-only --team-repository=private --challenge=index-logs
Configuration file (located in
~/.rally/rally.ini
)):JVM version:
OS version: Ubuntu 16.04.2 LTS
Description of the problem including expected versus actual behavior:
When using several index operation in parallel, esrally fail to partition correctly the data between the clients, resulting in docs not being indexed for the 2nd and subsequent indices.
Steps to reproduce:
Launch the bench :
esrally --track=logs --offline --target-hosts=127.0.0.1:9200 --pipeline=benchmark-only --team-repository=private --challenge=index-logs
In output, we can see that rally is using a bad offset for index logs2:
It should be starting at offset 0 for both indices.
From what i've seen in the code, the partition algorithm is using the client_id and total_clients to partition data. This doesn't work with parallel index tasks because it does not take into account that the data source is different for each task.
The text was updated successfully, but these errors were encountered: