Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove backend registration system #561

Closed
evansd opened this issue Jun 22, 2022 · 0 comments · Fixed by #576
Closed

Remove backend registration system #561

evansd opened this issue Jun 22, 2022 · 0 comments · Fixed by #576

Comments

@evansd
Copy link
Contributor

evansd commented Jun 22, 2022

At present, backends define a backend_id and register themselves in a global dictionary so that they can be referenced by this ID in environment variables.

However, this requires importing all the backends so they can register themselves. And this causes circular import problems e.g.

$ python -c 'from databuilder.query_engines.sqlite import SQLiteQueryEngine'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/dave/projects/ebmdatalab/databuilder/databuilder/query_engines/sqlite.py", line 5, in <module>
    from databuilder.query_engines.base_sql import BaseSQLQueryEngine
  File "/home/dave/projects/ebmdatalab/databuilder/databuilder/query_engines/base_sql.py", line 8, in <module>
    from databuilder.backends.base import DefaultBackend
  File "/home/dave/projects/ebmdatalab/databuilder/databuilder/backends/__init__.py", line 3, in <module>
    from .databricks import DatabricksBackend
  File "/home/dave/projects/ebmdatalab/databuilder/databuilder/backends/databricks.py", line 2, in <module>
    from ..query_engines.spark import SparkQueryEngine
  File "/home/dave/projects/ebmdatalab/databuilder/databuilder/query_engines/spark.py", line 4, in <module>
    from databuilder.query_engines.base_sql import BaseSQLQueryEngine
ImportError: cannot import name 'BaseSQLQueryEngine' from partially initialized module 'databuilder.query_engines.base_sql' (most likely due to a circular import) (/home/dave/projects/ebmdatalab/databuilder/databuilder/query_engines/base_sql.py)

We should get rid of the registration system and allow the environment to specify the backend by its dotted path e.g.

OPENSAFELY_BACKEND='databuilder.backends.tpp.TPPBackend'

As well as fixing the circular import problem and simplifying the code this naturally supports out-of-tree backends, which I think is desirable.

evansd added a commit that referenced this issue Jun 27, 2022
This fixes a [circular import][1] problem and at the same time makes it
possible to use databuilder with backends definitions that aren't
bundled within databuilder itself.

I don't currently anticipate users needing to supply backend names
themselves on the command line, so the extra verbosity shouldn't be a
problem here. But if it turns out they do need to do so then we can
define some short aliases for commonly used backends.

Closes #561

[1]: #561
evansd added a commit that referenced this issue Jun 27, 2022
The effect of this PR is that rather than specifying a backend using an
alias like:

    OPENSAFELY_BACKEND='tpp'

The backend is instead specified using its full dotted path:

    OPENSAFELY_BACKEND='databuilder.backends.tpp.TPPBackend'

This fixes a [circular import][1] problem, removes some code, and at the
same time makes it possible to use databuilder with backends definitions
that aren't bundled within databuilder itself.

I don't currently anticipate users needing to supply backend names
themselves on the command line, so the extra verbosity shouldn't be a
problem here. But if it turns out they do need to do so then we can
define a mapping of short aliases to full dotted paths for commonly used
backends.

Closes #561

[1]: #561
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant