-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mssql): use odbc #7317
feat(mssql): use odbc #7317
Conversation
6ff2b45
to
ebb8f0c
Compare
Wow, that was so fast. Thank you so much for prioritziing this! I think I am slowly getting it to work, I've had to modify the code a little bit to get it to accept the parameters I needed to. These are some very rough changes I made just to brute-force get the parameters through, I'm sure there's a cleaner way to do this. conn_ib = ibis.mssql.connect(
host=None,
port=None,
user=None,
query={"odbc_connect": f"Driver=ODBC+Driver+17+for+SQL+Server;SERVER={server};DATABASE={database}"},
**connect_args
) where It seems that the EDIT: just to confirm, using the code block I posted above using the version of the code in my branch, I already successfully queried a simple table! 🙌 |
updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0)
@inigohidalgo Nice! I incorporated most of your changes, they look reasonable to me, thank you for that! |
I'm not a huge fan of passing
as a string in the query field when ibis' API already offers those arguments as part of the |
@inigohidalgo What does your ideal call to |
I'm going to have a look around because from what I see I should be able to get the default EDIT: although my host looks like "XXX-XXX-XXX.database.windows.net" instead of just a single "host", not sure if that makes a difference. |
Yeah I'm not able to connect without passing In my ideal call, the host and database parameters would be added to that |
I'm not sure we can special case |
7af2a85
to
f09ba0f
Compare
Absolutely, it wouldn't make any sense to have such a specific implementation, especially when it seems it SHOULD work from what I'm reading online. According to this SQLAlchemy should be converting the connection string to the "SERVER={server};DATABASE={database}" but it isn't working. Also apparently SQLAlchemy adds a parameter Trusted_Connection=Yes which I'm supposed to remove, but I'm having issues understanding where I'm supposed to remove it
|
At worst since all our connections are going to be done through my kedro wrapper, I can handle this on my side, but that would make that wrapper implementation less of a general Ibis wrapper and more of an Ibis+Azure one. |
I think to attach that event listener you'd do this: con = ibis.mssql.connect(...)
@event.listens_for(con.con, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
... |
That did the trick! import struct
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
from azure import identity
import ibis
SQL_COPT_SS_ACCESS_TOKEN = 1256 # Connection option for access tokens, as defined in msodbcsql.h
TOKEN_URL = "https://database.windows.net/" # The token URL for any Azure SQL database
connection_string = "mssql://XXX.database.windows.net/XXX?driver=ODBC+Driver+17+for+SQL+Server"
conn = ibis.connect(connection_string)
azure_credentials = identity.DefaultAzureCredential()
@event.listens_for(conn.con, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
# remove the "Trusted_Connection" parameter that SQLAlchemy adds
cargs[0] = cargs[0].replace(";Trusted_Connection=Yes", "")
# create token credential
raw_token = azure_credentials.get_token(TOKEN_URL).token.encode("utf-16-le")
token_struct = struct.pack(f"<I{len(raw_token)}s", len(raw_token), raw_token)
# apply it to keyword arguments
cparams["attrs_before"] = {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
conn.table(name="INPUT_VARIABLES_CONN_PT", schema="IGD").to_pandas() One issue I have run into is that I'm having to monkeypatch Just a heads up I'm gonna be quite busy for the next 24-48h so I'll be a little bit less responsive. |
I think you should be able to write conn = ibis.connect(
"mssql://XXX.database.windows.net/XXX",
query=dict(driver="ODBC+Driver+17+for+SQL+Server"),
) Any Can you give that a try? |
Given the complexity of using tokens, I think it makes sense to add a That would make the connection code look like this: import struct
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
from azure import identity
import ibis
SQL_COPT_SS_ACCESS_TOKEN = 1256 # Connection option for access tokens, as defined in msodbcsql.h
TOKEN_URL = "https://database.windows.net/" # The token URL for any Azure SQL database
azure_credentials = identity.DefaultAzureCredential()
def provide_token(dialect, conn_rec, cargs, cparams):
# remove the "Trusted_Connection" parameter that SQLAlchemy adds
cargs[0] = cargs[0].replace(";Trusted_Connection=Yes", "")
# create token credential
raw_token = azure_credentials.get_token(TOKEN_URL).token.encode("utf-16-le")
token_struct = struct.pack(f"<I{len(raw_token)}s", len(raw_token), raw_token)
# apply it to keyword arguments
cparams["attrs_before"] = {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
connection_string = "mssql://XXX.database.windows.net/XXX?driver=ODBC+Driver+17+for+SQL+Server"
conn = ibis.connect(connection_string, token_provider=provide_token)
conn.table(name="INPUT_VARIABLES_CONN_PT", schema="IGD").to_pandas() |
Indeed that worked! What is the more commonly-recommended API, |
Start with If there's something you can't do with |
Regarding |
Yep. It's basically this code at the end of if token_provider is not None:
sa.event.listens_for(engine, "do_connect")(token_provider) |
The main benefit is that you can put your token providing code wherever you want. It doesn't need to be right next to the connection. |
Cool! Is that something you'd be okay with adding into Ibis? I assume there'll be more ppl who have the same issue but when I was searching the issues I couldn't find anyone with the same question as me. |
Personally, yes, but I'll put up a separate PR for it where we can discuss whether folks think it's a good idea or not. |
My hunch is that once people are able to use Azure services with ibis then it'll be an issue :) |
One thing to note is that this is likely not going to be released until 8.0 because it's a pretty big change for MS SQL users (unclear how many of those we have). It seems like many of the would-be MS SQL users aren't currently using ibis because they require odbc support, so for them this is going from zero to one, but for any existing users they'll have to migrate over to PyODBC. |
Not a problem for us, it'll take a while for us to migrate our pipelines to Ibis so in the meantime I'm happy to just build using either |
Sweet, yeah, assuming this gets merged, it'll be done in the |
One final kink I see is the difference between Ibis' |
Perhaps we can remove the |
And provide a |
Is the current |
I think the |
Ah, no, but that's how it used to be before this PR! In this PR, A point of confusion is that there's the |
Hm, I might need to tinker with this a bit to work out how url-based and non-url-based connection works. |
But at the moment, using Would you keep the query argument in |
Those are two different The first ( The second is the PyODBC notion of driver, which is the thing being put into |
Ok, that makes sense. I'm not sure if it might lead to a bit of confusion down the line, but for a start, keeping the |
I think to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR moves the MSSQL backend to use ODBC as the underlying driver. This is necessary to support connecting to Azure services. The main user-facing API change is that you now need to set up ODBC to use the MSSQL backend.
Closes #6640.
Closes #6012.
xref #7306