Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async support #1714

Merged
merged 27 commits into from
Apr 3, 2024
Merged

async support #1714

merged 27 commits into from
Apr 3, 2024

Conversation

miguelgrinberg
Copy link
Collaborator

@miguelgrinberg miguelgrinberg commented Mar 8, 2024

Async support. Tasks:

  • async connection management
  • unasync Search
  • check for invalid _sync directory in lint step
  • unasync Search unit tests
  • unasync Document
  • unasync Search integration tests
  • unasync Document tests
  • unasync Index
  • unasync Document integration tests
  • unasync update_by_query
  • unasync mappings
  • unasync faceted_search
  • unasync analyzer
  • Examples
  • Example integration tests
  • Documentation updates

Closes #1480, #1672

@miguelgrinberg miguelgrinberg marked this pull request as draft March 8, 2024 20:58
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

  • Focusing on Search first is a good idea
  • I like that most methods are not doing I/O and can be in BaseSearch (which should be named SearchBase which is the convention in this project)
  • Since this code is not generated like in elasticsearch-py, lint should check that unasync(_async/search.py) == _sync/search.py. The alternative is to not store _sync/search.py on disk and use unasync's setuptools integration.
  • We need to use unasync on tests too, and the lint/docs builds need to work.
  • Should we keep search.py so that from elasticsearch_dsl.search import Search still works? It's unfortunately widely used

@miguelgrinberg
Copy link
Collaborator Author

Since this code is not generated like in elasticsearch-py, lint should check that unasync(_async/search.py) == _sync/search.py. The alternative is to not store _sync/search.py on disk and use unasync's setuptools integration.

Do you mean to add this as a check, to make sure the code in _sync has not been hand edited? Like running unasync and then making sure git status is clean?

We need to use unasync on tests too, and the lint/docs builds need to work.

Yes, I will look at that next.

Should we keep search.py so that from elasticsearch_dsl.search import Search still works? It's unfortunately widely used

Right, I noticed that the search.py module is often referenced explicitly, so I thought it should be an alias for the generated _sync/search.py. We could also import the async classes in this module, so that people don't have to use _async.search in their imports.

@pquentin
Copy link
Member

Do you mean to add this as a check, to make sure the code in _sync has not been hand edited? Like running unasync and then making sure git status is clean?

Yes, exactly, we want to make sure in nox -s lint that 1. _sync was not hand edited and 2. _async was not modified without running unasync. The problem with git status or git diff is that they return zero even on success. So maybe we should use Python and difflib from the standard library.

Right, I noticed that the search.py module is often referenced explicitly, so I thought it should be an alias for the generated _sync/search.py. We could also import the async classes in this module, so that people don't have to use _async.search in their imports.

Most of those results are from forks of a discontinued Mozilla products (zamboni and kuma) but who knows how widely used it is in private code. That said, I don't want to encourage this usage. Not sure if we can emit a warning on from elasticsearch_dsl.search import Search but I don't think it's necessary to have AsyncSearch there.

@miguelgrinberg miguelgrinberg force-pushed the async-support branch 8 times, most recently from e89e2a3 to 0f11586 Compare March 13, 2024 17:14
@miguelgrinberg
Copy link
Collaborator Author

miguelgrinberg commented Mar 13, 2024

@pquentin Making good progress on this. See the top issue comment for the completed tasks. I will try to get through the document tests and see if that is a good stop point for this PR, depending on how much is left.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks! I have reviewed the general approach so far, not yet the nitty gritty detail of what is sync, async or shared. But I'm not expecting any big issues there.

The next steps:

  • opening a pull request to drop Python 3.7 support (I can do that if you'd prefer)
  • add an async page to the docs, probably close to https://elasticsearch-py.readthedocs.io/en/v8.12.1/async.html (with general explanations around installation and usage and a complete API Reference at the end)
  • explain how to contribute code in CONTRIBUTING.rst

.github/workflows/ci.yml Outdated Show resolved Hide resolved
@@ -11,3 +11,4 @@ filterwarnings =
error
ignore:Legacy index templates are deprecated in favor of composable templates.:elasticsearch.exceptions.ElasticsearchWarning
ignore:datetime.datetime.utcfromtimestamp\(\) is deprecated and scheduled for removal in a future version..*:DeprecationWarning
asyncio_mode = auto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the default (strict mode) as this is more explicit and will help to add Trio support in the future.

Copy link
Collaborator Author

@miguelgrinberg miguelgrinberg Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how would that work with unasync. This is one example in which the code would need to be different. Async tests need a decorator to declare them as async, but that decorator needs to be removed in the sync conversion.

Example:

_async version:

@pytest.mark.asyncio
async def test_some_code():
    res = await library.do_something()
    assert b"expected result" == res

_sync version:

def test_some_code():
    res = library.do_something()
    assert b"expected result" == res

How do I tell unasync to remove the decorator when doing this conversion? The auto mode solves this, because neither async nor sync need a decorator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, unasync only replaces tokens and @pytest.mark.asyncio is tokenized with three OP (@, . and .) and three NAME (pytest, mark, asyncio). Asking unasync to look ahead sounds overly complex for this use case.

We could add our own marker (sync) that would not do anything, but it would be confusing and would prevent from even importing the asyncio module. Maybe that's good if we do want to add Trio support because anything fancy would require AnyIO anyway.

We could add the marker using pytestmark = pytest.mark.asyncio at the top of the file and then modify run-unasync.py to remove that line for sync tests, and ask unasync to translate it to pytest.mark.trio for Trio tests.

In any case, that's too complicated for the current pull request, and can be left for future work. Resolving this, thanks.

setup.py Outdated Show resolved Hide resolved
elasticsearch_dsl/document.py Outdated Show resolved Hide resolved
noxfile.py Show resolved Hide resolved
@miguelgrinberg
Copy link
Collaborator Author

miguelgrinberg commented Mar 19, 2024

@pquentin Some more updates:

  • Added update_by_query.py and mapping.py. The only remaining module that does I/O is analysis.py, but I'm not sure a conversion is necessary, because the only I/O happens in a CustomAnalyzer class that is not publicly exported. Do we want a AsyncCustomAnalyzer as well?
  • Removed the star imports and added what I thought were sensible things to import in document.py, search.py, index.py, update_by_query.py and mapping.py. I have an open question wrt also import the async classes into these files, so that they can be imported more easily.
  • The faceted_search.py module references the Search class, but does not perform any I/O with it. Do we want to unasync it, or would it be better to pass the search class to use as an argument (defaulting to Search)?
  • All tests that use the unasynced classes were also unasynced, including both regular and integration. It would probably be good for you to have a look and make sure I did not leave anything out. Coverage is at 93% (it is 94% for the currently released version, I think, so a very minor change that is likely caused by duplicating code that isn't covered via unasync). No tests were changed, added or removed.

Will work on the docs next!

@pquentin
Copy link
Member

Added update_by_query.py and mapping.py. The only remaining module that does I/O is analysis.py, but I'm not sure a conversion is necessary, because the only I/O happens in a CustomAnalyzer class that is not publicly exported. Do we want a AsyncCustomAnalyzer as well?

There's some metaclass magic that I don't understand, but this is exposed through analyzer(), eg. see the custom analyzer for names in examples/completion.py. Does the example work when converted to asyncio? (We should consider leaving this for future work though, to not make this giant pull request even bigger.)

The faceted_search.py module references the Search class, but does not perform any I/O with it. Do we want to unasync it, or would it be better to pass the search class to use as an argument (defaulting to Search)?

The second option seems better. Is there any disadvantage I'm missing?

All tests that use the unasynced classes were also unasynced, including both regular and integration. It would probably be good for you to have a look and make sure I did not leave anything out. Coverage is at 93% (it is 94% for the currently released version, I think, so a very minor change that is likely caused by duplicating code that isn't covered via unasync). No tests were changed, added or removed.

Right, I do plan to read everything at least once, but I'm bound to miss things given the size of the pull request. Which is why I'm procrastinating :)

@miguelgrinberg
Copy link
Collaborator Author

miguelgrinberg commented Mar 20, 2024

I have yet to convert the examples, so I'll test how analysis is used and determine if we need to do a conversion. None of the tests seemed to need an async analysis.py, but like you, I don't fully understand what's going on with this module.

For the faceted search class I agree with you that unasync is overkill. We just need to add the search class as an argument, and then we can create a AsyncFacetedSearch class that uses AsyncSearch.

Update on the faceted search: I missed that there is an execute() method in this class, so for consistency I have converted it to unasync like the others.

@miguelgrinberg miguelgrinberg marked this pull request as ready for review March 22, 2024 17:33
@miguelgrinberg
Copy link
Collaborator Author

@pquentin This PR is ready for review.

@pquentin
Copy link
Member

pquentin commented Mar 26, 2024

I'm going to review in the same order as you did your work, and the checklist will be in this comment.

  • General infrastructure (ci.yml, all __init__.py files in elasticsearch_dsl, and tests), noxfile.py, setup.cfg, setup.py, tests/conftest.py, utils/run-unasync-py)
  • async connection management - async_connections.py and connections.py
  • unasync Search - _async/search.py and _sync/search.py (plus search.py and search_base.py)
  • unasync Search unit tests - tests/_async/test_search.py and tests/_sync/test_search.py
  • unasync Document - _async/document.py and _sync/document.py (plus document.py and document_base.py)
  • unasync Search integration tests - test_integration/_async/test_search.py and test_integration/_async/test_search.py
  • unasync Document tests - _async/test_document.py vs _sync/test_document.py
  • unasync Index - _async/index.py and _sync/index.py (plus index.py and index_base.py and tests)
  • unasync Document integration tests - test_integration/_async/test_document.py and test_integration/_async/test_document.py
  • unasync update_by_query - _async/update_by_query.py and _sync/update_by_query.py (plus update_by_query.py and update_by_query_base.py and tests)
  • unasync mappings - _async/mapping.py and _sync/mapping.py (plus mapping.py and mapping_base.py and tests)
  • unasync faceted_search - _async/faceted_search.py and _sync/faceted_search.py (plus faceted_search.py and faceted_search_base.py, and tests)
  • unasync analyzer (analysis.py and tests)
  • Examples
  • Example integration tests
  • Documentation updates (docs/ and CONTRIBUTING.rst)

CONTRIBUTING.rst Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
utils/run-unasync.py Show resolved Hide resolved
utils/run-unasync.py Outdated Show resolved Hide resolved
elasticsearch_dsl/search.py Show resolved Hide resolved
- remove unused iter/aiter in unasync conversion table
- improve contributing documentation wrt _async subdirectories
- remove some code duplication in async test fixtures
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed a third of the files now.

tests/_async/test_document.py Outdated Show resolved Hide resolved
elasticsearch_dsl/_async/index.py Outdated Show resolved Hide resolved
tests/_sync/test_index.py Show resolved Hide resolved
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed another third of the files: docs, examples and their tests are left.

tests/_async/test_mapping.py Outdated Show resolved Hide resolved
tests/_async/test_search.py Outdated Show resolved Hide resolved
elasticsearch_dsl/_async/index.py Outdated Show resolved Hide resolved
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed everything now! Thanks again for this massive body of work.

Examples

I believe examples should be using unasync too. I'm already seeing divergences (such as yield from, moving away from @property or closing connections) that should not be here, and they will only grow in the future. This would require:

  1. having a script (run-unasync.py or a wrapper) to replace asyncio.run(main()) with main() and remove import asyncio, since unasync can't do it itself.
  2. checking we can generate sync examples from async example just like the rest of the files with CI

Docs

Regarding documentation, we should mention async/await support in the introduction and have an async example in index.rst

tests/_async/test_mapping.py Outdated Show resolved Hide resolved
docs/asyncio.rst Outdated Show resolved Hide resolved
docs/asyncio.rst Show resolved Hide resolved
docs/api.rst Outdated Show resolved Hide resolved
@miguelgrinberg
Copy link
Collaborator Author

@pquentin I think I have addressed everything except the unasync for the examples. Please have a look at the new structure for the docs with a separate reference page for the async classes.

I will look at the examples next week. I would like to avoid having to change the locations of the sync examples, I'm going to need a special unasync rule to handle going from examples/async/x.py to examples/x.py. I don't think we want the ugly _sync and _async for these.

@pquentin
Copy link
Member

pquentin commented Apr 2, 2024

The current navigation feels off:

image

We have two pages about asyncio, one with text and the other with the reference API. And the async reference API has a different name from the sync reference API. Maybe we should have the following navigation instead?

  • Tutorials
    • Quickstart (with the current content of the index page?)
  • How-to guides
    • Search DSL
    • Persistence
    • Faceted Search
    • Update by Query
    • Using asyncio with Elasticsearch DSL
  • Reference
    • API Documentation
    • Async API Documentation

The section names come from https://diataxis.fr/. Here's an example of Sphinx documentation using sections in the navigation: https://esrally.readthedocs.io/en/stable/. Since the links don't reference the section, I believe all existing links will continue to work.

Regarding documentation, we should mention async/await support in the introduction and have an async example in index.rst

I don't believe this was addressed yet.

@miguelgrinberg
Copy link
Collaborator Author

I like the two-level heading organization much better. And you are correct that I forgot to add the async mention and example in the introduction. Will do that as well.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new doc navigation is amazing, thanks! And thanks for working on the examples too.

Looks good to me.

subprocess.check_call(["black", "--target-version=py38", output_dir])
subprocess.check_call(["isort", output_dir])
for file in glob("*.py", root_dir=dir[0]):
# remove asyncio from sync files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few doubts about 1/ doing this for all files and not only examples and 2/ using a sed command carefully crafted to work on both Linux and macOS. This seems fragile, but it works, so let's do it.

Copy link
Collaborator Author

@miguelgrinberg miguelgrinberg Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100% agree on the fragility. But I don't feel doing this for all files is a problem, since what we are doing is the correct thing to do as a general case. I could actually make the regex in the sed a bit smarter and capture the function name that is passed to asyncio.run() instead of having it hardcoded. I would think this type of transformation would make sense to add to unasync (although not with sed!).

@pquentin pquentin merged commit b50d538 into elastic:main Apr 3, 2024
15 checks passed
@pquentin pquentin mentioned this pull request Apr 3, 2024
2 tasks
@miguelgrinberg miguelgrinberg deleted the async-support branch April 3, 2024 08:48
@miguelgrinberg miguelgrinberg mentioned this pull request Apr 3, 2024
miguelgrinberg added a commit to miguelgrinberg/elasticsearch-dsl-py that referenced this pull request Apr 3, 2024
All public classes have async versions now. The documentation navigation
was also updated to follow the Diátaxis documentation framework.
@miguelgrinberg miguelgrinberg added the backport 8.x Backport to 8.x label Apr 3, 2024
github-actions bot pushed a commit that referenced this pull request Apr 3, 2024
All public classes have async versions now. The documentation navigation
was also updated to follow the Diátaxis documentation framework.

(cherry picked from commit b50d538)
miguelgrinberg added a commit that referenced this pull request Apr 3, 2024
All public classes have async versions now. The documentation navigation
was also updated to follow the Diátaxis documentation framework.

(cherry picked from commit b50d538)

Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants