Allow to use significantly more clients #944

danielmitterdorfer · 2020-04-01T07:25:58Z

With this commit we change how clients are assigned to worker processes in the
load generator. While historically Rally has assigned one worker (process) to
one client, with the changes done in #935 we can assign multiple clients to a
single worker process now and run the clients in an asyncio event loop. This
allows us to simulate many more clients than what is possible the process-based
approach which is very heavy-weight.

By default Rally will only create as many workers as there are CPUs available on
the system. This choice can be overridden in Rally's config file with the
configuration setting available.cores in the section system to allocate more
or less workers. If multiple load generator machines note that we assume that
all of them have the same hardware specifications and only take the coordinating
machine's CPU count into account.

hub-cap · 2020-04-01T23:35:19Z

@elasticmachine run tests

hub-cap · 2020-04-02T01:01:40Z

@elasticmachine run tests

dliappis

This is a surprisingly smooth addition leveraging the actor system to allow
us increase the amount of clients!

Unrelated to this specific PR, but since #935 where we introduced the MmapSource it appears that with ~4000 clients (on a single machine) there reported memory usage (from top and pmap) is ~1.1TB, so it's likely we need to look at this in more detail in a follow up.

dliappis · 2020-04-03T10:03:01Z

esrally/driver/driver.py

@@ -431,12 +434,19 @@ def prepare_benchmark(self, t):
        self.prepare_telemetry(es_clients)
        self.target.on_cluster_details_retrieved(self.retrieve_cluster_info(es_clients))
        for host in self.config.opts("driver", "load_driver_hosts"):
+            host_config = {
+                # for simplicity we assume that all benchmark machines have the same specs


dliappis · 2020-04-03T12:40:07Z

esrally/driver/driver.py

-                    self.config, self.client_id, task, schedule, self.sampler, self.cancel, self.complete, self.abort_on_error)
+                self.logger.info("Worker[%d] is executing tasks at index [%d].", self.worker_id, self.current_task_index)
+                # allow to buffer more events than by default as we expect to have way more clients.
+                self.sampler = Sampler(start_timestamp=time.perf_counter(), buffer_size=65536)


Does it make sense to declare static values like 65535 via a module variable?

dliappis · 2020-04-03T12:48:22Z

esrally/driver/driver.py

-            self.client_id, self.sub_task, self.schedule, es, self.sampler, self.cancel, self.complete, self.abort_on_error)
-        final_executor = AsyncProfiler(async_executor) if self.profiling_enabled else async_executor
+
+        aws = []


nit: it's not easy to understand what this variable holds judging by its name (plus the name inadvertently conjures thoughts about cloud)

danielmitterdorfer · 2020-04-06T06:52:33Z

@elasticmachine test this please

With this commit we provide a URL object when issuing a request instead of a string representation to avoid expensive string parsing in aiohttp. In our tests this has reduced the client side overhead by about one millisecond which is important when benchmarking queries which finish within single-digit milliseconds. Relates elastic#944

With this commit we provide a URL object when issuing a request instead of a string representation to avoid expensive string parsing in aiohttp. In our tests this has reduced the client side overhead by about one millisecond which is important when benchmarking queries which finish within single-digit milliseconds. Relates #944

With this commit we properly determine the most recent sample per client by looping over all clients instead of extracting only one sample. The reason this was working before is that each processes only simulated one client but with elastic#944 each worker can simulate multiple clients so we need to check all samples to determine the most recent one.

With this commit we properly determine the most recent sample per client by looping over all clients instead of extracting only one sample. The reason this was working before is that each processes only simulated one client but with #944 each worker can simulate multiple clients so we need to check all samples to determine the most recent one.

danielmitterdorfer · 2020-04-16T10:14:00Z

Unrelated to this specific PR, but since #935 where we introduced the MmapSource it appears that with ~4000 clients (on a single machine) there reported memory usage (from top and pmap) is ~1.1TB, so it's likely we need to look at this in more detail in a follow up.

#963 to addresses this by sharing the memory-mapped file within each process. This means we memory-map a data file only once per worker (process) instead of once per client.

Change client assignment (WIP

af31eb9

danielmitterdorfer added enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. highlight A substantial improvement that is worth mentioning separately in release notes labels Apr 1, 2020

danielmitterdorfer self-assigned this Apr 1, 2020

danielmitterdorfer added 3 commits April 1, 2020 13:26

Fix existing tests

2ba2925

Add test for worker assignments

e9d5d68

Make the linter happy

bdee89c

danielmitterdorfer added 5 commits April 2, 2020 08:41

More logging

0d6d1b4

even more logging

2c29074

Use stopwatch

b89f875

Convert core count to int

b5c4351

Fix linter issues

27fa34f

danielmitterdorfer requested a review from dliappis April 2, 2020 12:59

danielmitterdorfer marked this pull request as ready for review April 2, 2020 12:59

danielmitterdorfer added this to the 1.5.0 milestone Apr 2, 2020

dliappis approved these changes Apr 3, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into multicore-async

25736ba

danielmitterdorfer merged commit 8304b64 into elastic:master Apr 6, 2020

danielmitterdorfer mentioned this pull request Apr 8, 2020

An actorless load generator #852

Closed

danielmitterdorfer mentioned this pull request Apr 8, 2020

Avoid duplicate URL parsing for async connections #958

Merged

This was referenced Apr 14, 2020

Determine most recent sample per client #961

Merged

Adapt response handling in async runners elastic/rally-eventdata-track#82

Merged

danielmitterdorfer deleted the multicore-async branch April 16, 2020 10:12

qiaoxux mentioned this pull request Nov 20, 2023

[FEATURE] Ability to dynamically spin up more workers to increase benchmarking throughput for indexing and search opensearch-project/opensearch-benchmark#417

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to use significantly more clients #944

Allow to use significantly more clients #944

danielmitterdorfer commented Apr 1, 2020 •

edited

Loading

hub-cap commented Apr 1, 2020

hub-cap commented Apr 2, 2020

dliappis left a comment

dliappis Apr 3, 2020

dliappis Apr 3, 2020

dliappis Apr 3, 2020

danielmitterdorfer commented Apr 6, 2020

danielmitterdorfer commented Apr 16, 2020

Allow to use significantly more clients #944

Allow to use significantly more clients #944

Conversation

danielmitterdorfer commented Apr 1, 2020 • edited Loading

hub-cap commented Apr 1, 2020

hub-cap commented Apr 2, 2020

dliappis left a comment

Choose a reason for hiding this comment

dliappis Apr 3, 2020

Choose a reason for hiding this comment

dliappis Apr 3, 2020

Choose a reason for hiding this comment

dliappis Apr 3, 2020

Choose a reason for hiding this comment

danielmitterdorfer commented Apr 6, 2020

danielmitterdorfer commented Apr 16, 2020

danielmitterdorfer commented Apr 1, 2020 •

edited

Loading