Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for Microsoft Fabric Warehouse #9494

Closed
1 task done
csubhodeep opened this issue Jul 1, 2024 · 16 comments · Fixed by #9700
Closed
1 task done

feat: Support for Microsoft Fabric Warehouse #9494

csubhodeep opened this issue Jul 1, 2024 · 16 comments · Fixed by #9700
Labels
feature Features or general enhancements mssql The Microsoft SQL Server backend
Milestone

Comments

@csubhodeep
Copy link

Is your feature request related to a problem?

NO.

What is the motivation behind your request?

xref #9454

Describe the solution you'd like

I would like to be able to interact with a typical Microsoft Fabric Warehouse, which has quite similar interfaces as any other Datawarehousing technologies that ibis already supports.
It would be nice if we could continue using the ibis.mssql backend for it.

What version of ibis are you running?

9.1.0

What backend(s) are you using, if any?

DuckDB & SQLServer

Code of Conduct

  • I agree to follow this project's Code of Conduct
@csubhodeep csubhodeep added the feature Features or general enhancements label Jul 1, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 3, 2024

@csubhodeep Have you tried using the Ibis mssql backend for this? Maybe it just works™?

@cpcloud
Copy link
Member

cpcloud commented Jul 3, 2024

@djouallah Is this something that might interest you?

@csubhodeep
Copy link
Author

@cpcloud yes I have tried the mssql backend and saw the error messages that i have mentioned in the discussion. It "kind of works".. but requires a bit of hacking.

@csubhodeep
Copy link
Author

JFYI.. I think for most of the methods like list_tables, tables, list_databases, list_schemas, create_table could be supported quickly because, i think, these are just suffering from a mere case-sensitivity issue, from the error message that I had shared in the thread we get the following

ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name '<NAME OF WAREHOUSE>.information_schema.schemata'. (208) (SQLExecDirectW)")

In our data warehouse if I just execute the following

select *
from INFORMATION_SCHEMA.COLUMNS;

I get a nice table without any error. However, if I do the following (as in the error message)

select *
from information_schema.columns;

I get the exact same error on Fabric
image

@csubhodeep
Copy link
Author

@cpcloud @gforsyth Do we know by when could we solve this one?

Also none of the hack works if I try to create a table from an in-memory table made using using either ibis.memtable or pandas.DataFrame constructors.

It is now blocking us, hence, it would be nice to know if anyone could suggest any workarounds till this is officially supported.

@csubhodeep
Copy link
Author

csubhodeep commented Jul 21, 2024

UPDATE

As per the their official docs

Currently, Latin1_General_100_BIN2_UTF8 is the default and only supported collation for both tables and metadata.

That means Fabric is currently case-sensitive presumably till the end of this quarter (if their development goes according to their plan). As per their roadmap they would support more collations later.

Now, from the perspective of ibis, at least from the error messages, if I understood them correctly, accessing the information_schema is crucial for some of the methods to work correctly (such as create_table, table, list_tables etc.)

Therefore, to circumvent the above, I wrote a small Python snippet that creates - a) a new schema with the name that ibis expects i.e. lower-cased information_schema, and b) views that the original schema i.e. (upper cased) INFORMATION_SCHEMA has but all of their names and the column names in them in lower case.

query_to_get_all_org_views_and_columns = """
SELECT 
    o.name AS view_name, 
    c.name AS column_name
FROM sys.all_columns c
INNER JOIN (
    SELECT *
    FROM sys.all_objects o
    WHERE schema_id = 3   -- schema_id = 3 is the schema_id for the information_schema
    ) o 
    ON c.object_id = o.object_id
"""

df_is = pd.read_sql_query(
    query_to_get_all_org_views_and_columns, 
    connection  # this is the same `pyodbc` connection created previously using custom authentication
)

cursor = connection.cursor()

for view_name, gdf in df_is.groupby("view_name"):
    select_columns = ", ".join([f"{col} AS {col.lower()}" for col in gdf["column_name"]])
    
    create_view_query = f"CREATE VIEW information_schema.{view_name.lower()} AS SELECT {select_columns} FROM INFORMATION_SCHEMA.{view_name};"
    cursor.execute(create_view_query)
    cursor.commit()

With the above I achieved the following
image

And with that all API methods work normally. 😃

Ofcourse, this is not at all a solution and rather a hack but I just wanted to share this in case someone else is looking for a temporary workaround.

@deepyaman
Copy link
Contributor

Hi @csubhodeep! Thanks for digging into this further, and glad you were able to find a workaround. We just discussed this amongst the team just now, and we honestly don't have a much better solution at this time that wouldn't be equally hacky or have potential unintended side effects. Not sure if the support for case-insensitive collation will help here once released, as it sounds like it would affect EVERYTHING in the database (which could easily break existing logic).

At this point, the workaround might be the best path forward. Would also love to see if other people come across this issue and/or find your workaround helpful.

@MarkPryceMaherMSFT
Copy link

Full disclosure I work in the Fabric DW team at Microsoft, @csubhodeep is correct, this is due to the collation. You would see the same error if you tried this on a SQL Server in the same collation (Latin1_General_100_BIN2_UTF8). [TLRDR: the system tables/views are case sensitive too]

The good news, we are adding support for a case insensitive collation in Fabric DW, in Q3 this year. link

@gforsyth gforsyth added the mssql The Microsoft SQL Server backend label Jul 26, 2024
@gforsyth
Copy link
Member

Hey all -- I'm going to take a look at doing this.

We built this backend against mcr.microsoft.com/mssql/server:2022-latest and it defaults (or all MSSQL servers default?) to SQL_Latin1_General_CP1_CI_AS.

Given that current users aren't having issues with that collation, if we switch our testing collation to be case-sensitive, it should (?) make things work with Fabric and other case-sensitive collations without breaking anyone using a case-insensitive collation.

Famous last words? I'll report back.

@csubhodeep
Copy link
Author

@gforsyth with my limited understanding of DB internals (and Fabric), I would like to ask a stupid question, does that mean ibis.mssql backend would do multiple rounds of running all (DDL) queries one with different cases (like, for instance, if upper case fails try with lower case)?

@gforsyth
Copy link
Member

@csubhodeep -- no, that would be an unacceptable trade-off.

What I mean is that if we update the mssql backend to test against a case-sensitive collation (and if it works...), then we'll correctly refer to, say, INFORMATION_SCHEMA.SCHEMATA and that will work on Fabric.

For users who were using a case-insensitive collation, it won't matter because SCHEMATA == schemata

@csubhodeep
Copy link
Author

Ah ok I see, thanks for the explanations 😃

@gforsyth
Copy link
Member

@csubhodeep -- if you install from main, Ibis should work on Fabric now, since we are now testing against the same collation (Latin1_General_100_BIN2_UTF8)

@csubhodeep
Copy link
Author

@csubhodeep -- if you install from main, Ibis should work on Fabric now, since we are now testing against the same collation (Latin1_General_100_BIN2_UTF8)

ok thanks a lot I would try it out at the earliest.

@csubhodeep
Copy link
Author

Currently I am facing some issues while logging into our Fabric workspace using my corporate account. I would try and give a feedback as soon as it is resolved. Meanwhile, @MarkPryceMaherMSFT it would be really nice if you (or someone from your team) could test this out. Thanks ! 😃

@csubhodeep
Copy link
Author

Hello @gforsyth , I have tested this and most of the API methods listed in this page works.

Although kind of unrelated to this thread, I would suggest, for the read_* methods that have not been implemented yet, you could either mark them (with colors or symbols) or simply remove them till they are actually implemented.

Thanks a lot again for pushing this feature through. 😃

@cpcloud cpcloud added this to the 9.3 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements mssql The Microsoft SQL Server backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants
@cpcloud @gforsyth @csubhodeep @deepyaman @MarkPryceMaherMSFT and others