Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an asyncio-based load generator #916

Merged
merged 33 commits into from
Mar 8, 2020

Conversation

danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Feb 21, 2020

With this commit we add a new experimental subcommand race-aync to Rally. It
allows to specify significantly more clients than the current race subcommand.
The reason for this is that under the hood, race-async uses asyncio and runs
all clients in a single event loop. Contrary to that, race uses an actor
system under the hood and maps each client to one process.

As the new subcommand is very experimental and not yet meant to be used broadly,
there is no accompanying user documentation in this PR. Instead, we plan to
build on top of this PR and expand the load generator to take advantage of
multiple cores before we consider this usable in production (it will likely keep
its experimental status though).

In this PR we also implement a compatibility layer into the current load
generator so both work internally now with asyncio. Consequently, we have
already adapted all Rally tracks with a backwards-compatibility layer (see
elastic/rally-tracks#97 and rally-eventdata-track#80).

Closes #852

With this commit we add an async load generator implementation. This
implementation is work in progress, extremely incomplete and hacky. We
also implement an async compatibility layer into the previous load
generator which allows us to compare both load generator implementations
in realistic scenarios.
With this commit we bump the minimum required Python version to Python
3.6 (thus dropping support for Python 3.5). Python 3.5 will be end of
life on September 13, 2020 (source: [1]). We also intend to use several
features that require at least Python 3.6 in future versions of Rally
thus we drop support for Python 3.5 now.

[1] https://devguide.python.org/#status-of-python-branches
With this commit we change Rally's internal implementation to always use
the async code path so runner implementations stay the same.
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. highlight A substantial improvement that is worth mentioning separately in release notes labels Feb 21, 2020
@danielmitterdorfer danielmitterdorfer self-assigned this Feb 21, 2020
danielmitterdorfer added a commit to elastic/rally-eventdata-track that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
danielmitterdorfer added a commit to elastic/rally-eventdata-track that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request Feb 21, 2020
Due to elastic/rally#852 we will implement a compatibility layer in the
current load generator that will also use the asyncio API and thus
requires custom runners to be registered differently (by specifying
`async_runner=True`). Rally's runner registry will also expose a new
attribute `async_runner` that is set to `True` if Rally requires runners
to be registered as described above.

With this commit we introduce a (temporary) compatibility layer for all
custom runners that allows older Rally versions to work with the classic
runners and newer Rally versions with the async runners.

Relates elastic/rally#852
Relates elastic/rally#916
@danielmitterdorfer danielmitterdorfer changed the title Add async load generator Add an asyncio-based load generator Feb 25, 2020
@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Feb 25, 2020

Notes to reviewers

General

This PR already includes #905 to drop Python 3.5 support. As soon as the other PR is merged to master I'll resolve conflicts but please ignore any diffs that are due to that PR.

Planned follow-up work

I plan to work on the following items after this PR has been merged:

  • Refactoring command line parsing: Currently we have only added the race-async subcommand parser to all places where we specify the race parser. However, this adds unneeded and unused options to its command line interface. Refactoring this would have been possible in this PR but it would have made it even larger. Therefore, we'll tackle this in a dedicated follow-up PR that will be much easier to review.
  • User documentation: At this point we only include documentation for the changes that are visible to users (i.e. changes to the runner API) but we don't explain the command line interface of the new experimental subcommand.
  • Integrating multiprocessing so we can take advantage of multi core CPUs: This is another significant change which will turn race-async into a command that can be considered usable (the single core version here is more to demonstrate that this is feasible at all)

Usage

The new subcommand only supports benchmarking and requires to use the install / start / stop subcommands to manage the respective Elasticsearch nodes.

Here are some example invocations:

# benchmark a local cluster in test mode
esrally race-async --track=geonames --on-error=abort --target-host="127.0.0.1:29200" --test-mode

# benchmark a local cluster with security enabled
esrally race-async --track=geonames --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --on-error=abort --target-host="127.0.0.1:29200"

Note that also the current load generator has changed (internally).

@danielmitterdorfer
Copy link
Member Author

The PR build (Python 3.7) just failed with:

08:18:36 2020-02-25 08:18:36,185 -not-actor-/PID:56312 esrally.rally ERROR A fatal error occurred while running subcommand [stop].
08:18:36 Traceback (most recent call last):
08:18:36   File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py37/lib/python3.7/site-packages/esrally/rally.py", line 713, in dispatch_sub_command
08:18:36     mechanic.stop(cfg)
08:18:36   File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py37/lib/python3.7/site-packages/esrally/mechanic/mechanic.py", line 131, in stop
08:18:36     node_launcher.stop(nodes, metrics_store)
08:18:36   File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py37/lib/python3.7/site-packages/esrally/mechanic/launcher.py", line 91, in stop
08:18:36     telemetry.add_metadata_for_node(metrics_store, node.node_name, node.host_name)
08:18:36   File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py37/lib/python3.7/site-packages/esrally/telemetry.py", line 882, in add_metadata_for_node
08:18:36     metrics_store.add_meta_info(metrics.MetaInfoScope.node, node_name, "os_name", sysstats.os_name())
08:18:36 AttributeError: 'NoneType' object has no attribute 'add_meta_info'

which is rather odd given that we actually have a guard that checks that before invoking the function:

if metrics_store:
telemetry.add_metadata_for_node(metrics_store, node_name, node.host_name)

Stopping a node is also single-threaded so I'm a bit at a loss why the metrics_store reference should be None but I'm documenting this so we have a reference that this has happened in case we see this again.

@elasticmachine test this please

@danielmitterdorfer danielmitterdorfer marked this pull request as ready for review February 25, 2020 09:12
@danielmitterdorfer
Copy link
Member Author

This uncovered indeed a real issue that only happens when we run against a Docker container against an in-memory metrics store. I have pushed #918 to address this.

@dliappis
Copy link
Contributor

Additional note to reviewers: on an existing Rally installation, before reviewing this PR it's advisable to delete your .venv and recreate with make prereq && make install

@dliappis
Copy link
Contributor

While testing against an Elasticsearch launched with an increased write thread pool

thread_pool:
  write:
    size: 6
    queue_size: 10000

I invoked Rally using

esrally race-async --track=http_logs --on-error=abort --target-host="127.0.0.1:39200" --challenge=append-no-conflicts-index-only --track-params="bulk_indexing_clients:500"

and hit an error at around 3%:

[ERROR] Cannot race-async. ('Request returned an error. Error type: transport, Description: ()', None)

Looking at the logs there was a request timeout, so probably Rally did what it should given that --on-error=abort but the CLI output could be improved.

Rally stack trace
2020-02-27 09:50:28,123 -not-actor-/PID:197 elasticsearch WARNING POST http://127.0.0.1:39200/_bulk [status:N/A request:60.009s]
Traceback (most recent call last):
  File "/Users/dl/source/elastic/rally/esrally/async_connection.py", line 105, in perform_request
    response = yield from self.session.request(method, url, data=body, headers=headers)
  File "/Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/aiohttp/client.py", line 504, in _request
    await resp.start(conn)
  File "/Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 847, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/aiohttp/streams.py", line 591, in read
    await self._waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dl/source/elastic/rally/esrally/async_connection.py", line 106, in perform_request
    raw_data = yield from response.text()
  File "/Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2020-02-27 09:50:28,128 -not-actor-/PID:197 esrally.driver.driver ERROR Could not execute schedule
Traceback (most recent call last):
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1143, in __call__
    total_ops, total_ops_unit, request_meta_data = await execute_single(runner, self.es, params, self.abort_on_error)
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1218, in execute_single
    raise exceptions.RallyAssertionError(msg)
esrally.exceptions.RallyAssertionError: ('Request returned an error. Error type: transport, Description:  ()', None)
2020-02-27 09:50:28,654 -not-actor-/PID:197 esrally.metrics INFO Closing metrics store.
2020-02-27 09:50:28,655 -not-actor-/PID:197 esrally.rally ERROR Cannot run subcommand [race-async].
Traceback (most recent call last):
  File "/Users/dl/source/elastic/rally/esrally/racecontrol.py", line 406, in run_async
    new_metrics = race_driver.run()
  File "/Users/dl/source/elastic/rally/esrally/driver/async_driver.py", line 210, in run
    loop.run_until_complete(benchmark_runner())
  File "/Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/Users/dl/source/elastic/rally/esrally/driver/async_driver.py", line 251, in _run_benchmark
    _ = await asyncio.gather(*aws)
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1143, in __call__
    total_ops, total_ops_unit, request_meta_data = await execute_single(runner, self.es, params, self.abort_on_error)
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1218, in execute_single
    raise exceptions.RallyAssertionError(msg)
esrally.exceptions.RallyAssertionError: ('Request returned an error. Error type: transport, Description:  ()', None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dl/source/elastic/rally/esrally/rally.py", line 718, in dispatch_sub_command
    racecontrol.run_async(cfg)
  File "/Users/dl/source/elastic/rally/esrally/racecontrol.py", line 412, in run_async
    raise exceptions.RallyError(str(e)).with_traceback(tb)
  File "/Users/dl/source/elastic/rally/esrally/racecontrol.py", line 406, in run_async
    new_metrics = race_driver.run()
  File "/Users/dl/source/elastic/rally/esrally/driver/async_driver.py", line 210, in run
    loop.run_until_complete(benchmark_runner())
  File "/Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/Users/dl/source/elastic/rally/esrally/driver/async_driver.py", line 251, in _run_benchmark
    _ = await asyncio.gather(*aws)
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1143, in __call__
    total_ops, total_ops_unit, request_meta_data = await execute_single(runner, self.es, params, self.abort_on_error)
  File "/Users/dl/source/elastic/rally/esrally/driver/driver.py", line 1218, in execute_single
    raise exceptions.RallyAssertionError(msg)
esrally.exceptions.RallyError: ("('Request returned an error. Error type: transport, Description:  ()', None)", None)
2020-02-27 09:50:28,742 -not-actor-/PID:197 esrally.driver.async_driver ERROR Uncaught exception in event loop: {'message': 'Task was destroyed but it is pending!', 'task': <Task pending name='Task-5039' coro=<AsyncTransport.main_loop() running at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:149> wait_for=<Future finished exception=ServerDisconnectedError() created at /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py:418> cb=[<TaskWakeupMethWrapper object at 0x106f1b1c0>()] created at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:214>, 'source_traceback': [<FrameSummary file /Users/dl/source/elastic/rally/.venv/bin/esrally, line 11 in <module>>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 912 in main>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 718 in dispatch_sub_command>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/racecontrol.py, line 406 in run_async>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/async_driver.py, line 210 in run>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 595 in run_until_complete>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 563 in run_forever>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 1836 in _run_once>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/events.py, line 81 in _run>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1143 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1180 in execute_single>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 193 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 241 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 452 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/utils.py, line 84 in _wrapped>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/__init__.py, line 1478 in bulk>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py, line 214 in perform_request>]}
2020-02-27 09:50:28,746 -not-actor-/PID:197 esrally.driver.async_driver ERROR Uncaught exception in event loop: {'message': 'Task was destroyed but it is pending!', 'task': <Task pending name='Task-5041' coro=<AsyncTransport.main_loop() running at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:149> wait_for=<Future finished exception=ServerDisconnectedError() created at /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py:418> cb=[<TaskWakeupMethWrapper object at 0x105da0c40>()] created at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:214>, 'source_traceback': [<FrameSummary file /Users/dl/source/elastic/rally/.venv/bin/esrally, line 11 in <module>>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 912 in main>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 718 in dispatch_sub_command>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/racecontrol.py, line 406 in run_async>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/async_driver.py, line 210 in run>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 595 in run_until_complete>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 563 in run_forever>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 1836 in _run_once>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/events.py, line 81 in _run>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1143 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1180 in execute_single>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 193 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 241 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 452 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/utils.py, line 84 in _wrapped>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/__init__.py, line 1478 in bulk>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py, line 214 in perform_request>]}
2020-02-27 09:50:28,753 -not-actor-/PID:197 esrally.driver.async_driver ERROR Uncaught exception in event loop: {'message': 'Task was destroyed but it is pending!', 'task': <Task pending name='Task-5043' coro=<AsyncTransport.main_loop() running at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:149> wait_for=<Future finished exception=ServerDisconnectedError() created at /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py:418> cb=[<TaskWakeupMethWrapper object at 0x1068f1b50>()] created at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:214>, 'source_traceback': [<FrameSummary file /Users/dl/source/elastic/rally/.venv/bin/esrally, line 11 in <module>>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 912 in main>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 718 in dispatch_sub_command>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/racecontrol.py, line 406 in run_async>
, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/async_driver.py, line 210 in run>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 595 in run_until_complete>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 563 in run_forever>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 1836 in _run_once>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/events.py, line 81 in _run>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1143 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1180 in execute_single>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 193 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 241 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 452 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/utils.py, line 84 in _wrapped>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/__init__.py, line 1478 in bulk>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py, line 214 in perform_request>]}
2020-02-27 09:50:28,759 -not-actor-/PID:197 esrally.driver.async_driver ERROR Uncaught exception in event loop: {'message': 'Task was destroyed but it is pending!', 'task': <Task pending name='Task-5045' coro=<AsyncTransport.main_loop() running at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:149> wait_for=<Future finished exception=ServerDisconnectedError() created at /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py:418> cb=[<TaskWakeupMethWrapper object at 0x106561220>()] created at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:214>, 'source_traceback': [<FrameSummary file /Users/dl/source/elastic/rally/.venv/bin/esrally, line 11 in <module>>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 912 in main>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 718 in dispatch_sub_command>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/racecontrol.py, line 406 in run_async>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/async_driver.py, line 210 in run>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 595 in run_until_complete>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 563 in run_forever>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 1836 in _run_once>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/events.py, line 81 in _run>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1143 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1180 in execute_single>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 193 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 241 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 452 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/utils.py, line 84 in _wrapped>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/__init__.py, line 1478 in bulk>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py, line 214 in perform_request>]}
2020-02-27 09:50:28,765 -not-actor-/PID:197 esrally.driver.async_driver ERROR Uncaught exception in event loop: {'message': 'Task was destroyed but it is pending!', 'task': <Task pending name='Task-5047' coro=<AsyncTransport.main_loop() running at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:149> wait_for=<Future finished exception=ServerDisconnectedError() created at /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py:418> cb=[<TaskWakeupMethWrapper object at 0x106992b50>()] created at /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py:214>, 'source_traceback': [<FrameSummary file /Users/dl/source/elastic/rally/.venv/bin/esrally, line 11 in <module>>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 912 in main>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/rally.py, line 718 in dispatch_sub_command>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/racecontrol.py, line 406 in run_async>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/async_driver.py, line 210 in run>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 595 in run_until_complete>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 563 in run_forever>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py, line 1836 in _run_once>, <FrameSummary file /Users/dl/.pyenv/versions/3.8.0/lib/python3.8/asyncio/events.py, line 81 in _run>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1143 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/driver.py, line 1180 in execute_single>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 193 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 241 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/esrally/driver/runner.py, line 452 in __call__>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/utils.py, line 84 in _wrapped>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch/client/__init__.py, line 1478 in bulk>, <FrameSummary file /Users/dl/source/elastic/rally/.venv/lib/python3.8/site-packages/elasticsearch_async/transport.py, line 214 in perform_request>]}

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive work thank you!

I took an initial pass -- not super deep admittedly -- of the code, except for the tests, and so far everything seems sane to me.

I left some comments about observations re: the failure situations.

I'll take another pass at a later stage.

@dliappis dliappis self-requested a review March 3, 2020 17:23
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tortured tested in a Vagrant env for backwards compatibility
and things went fine.

From my perspective we are good to merge.

@danielmitterdorfer
Copy link
Member Author

Thanks for your review and also finding the reporting issue with the network timeout. I dug and it turns out that when a ConnectionTimeout is raised by the client, the exception message is not helpful so I've implemented special handling for this case in 92bee31.

@danielmitterdorfer danielmitterdorfer merged commit c8f2de7 into elastic:master Mar 8, 2020
danielmitterdorfer added a commit that referenced this pull request Mar 9, 2020
danielmitterdorfer added a commit that referenced this pull request Mar 29, 2020
With this commit we add a new experimental subcommand `race-aync` to Rally. It
allows to specify significantly more clients than the current `race` subcommand.
The reason for this is that under the hood, `race-async` uses `asyncio` and runs
all clients in a single event loop. Contrary to that, `race` uses an actor
system under the hood and maps each client to one process.

As the new subcommand is very experimental and not yet meant to be used broadly,
there is no accompanying user documentation in this PR. Instead, we plan to
build on top of this PR and expand the load generator to take advantage of
multiple cores before we consider this usable in production (it will likely keep
its experimental status though).

In this PR we also implement a compatibility layer into the current load
generator so both work internally now with `asyncio`. Consequently, we have
already adapted all Rally tracks with a backwards-compatibility layer (see
elastic/rally-tracks#97 and elastic/rally-eventdata-track#80).

Closes #852
Relates #916
@danielmitterdorfer danielmitterdorfer removed this from the 1.5.0 milestone Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo highlight A substantial improvement that is worth mentioning separately in release notes :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

An actorless load generator
2 participants