Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(table-api): unify exception type for all backends to TableNotFound when a table does not exist #9695

Merged
merged 41 commits into from
Sep 16, 2024

Conversation

ncclementi
Copy link
Contributor

@ncclementi ncclementi commented Jul 25, 2024

Closes #9468

TODO:

  • Decide wether we want to raise the original backend dependent exception when existing
  • pandas and dask, unclear what exactly are we doing when raising KeyError
  • datafusion raise exception, not sure where is being caught
  • exasol seems to be letting thee exception to the backend, need to catch it before.
  • bigquery and snowflake, do separate test
  • understand flink and how to handle the error
  • Trino test somethign weird with the quotes/name of table I think I should move the !r to the instance of the exception.
  • Druid seems to be letting the exception to the backend, need to catch it before.
  • impala seems to be letting thee exception to the backend, need to catch it before.
  • fix postgres test_create_and_drop_table
  • fix mysql exception catch

BREAKING CHANGE: Missing tables now all raise a single Ibis exception. Library code that catches backend-specific exceptions to handle missing tables will need to be adjusted to use TableNotFound instead.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple changes needed, thanks for doing this!

ibis/backends/mysql/__init__.py Outdated Show resolved Hide resolved
ibis/backends/polars/__init__.py Outdated Show resolved Hide resolved
ibis/backends/postgres/tests/test_client.py Outdated Show resolved Hide resolved
ibis/backends/trino/tests/test_client.py Outdated Show resolved Hide resolved
ibis/common/exceptions.py Outdated Show resolved Hide resolved
@gforsyth gforsyth mentioned this pull request Jul 31, 2024
1 task
.sql(self.dialect)
)
except ProgrammingError as e:
if re.search(r"\bINVALID_INPUT\b", str(e)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get anything better than this matching, if anyone has any ideas, suggestions welcome

except Py4JJavaError as e:
# This seems to msg specific but not sure what a good work around is
if re.search(
rf"Table `{re.escape(table_name)}` was not found",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, but I'm not sure if there is a better way to get the error matching here. Suggestions welcomed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud do we think this is alright for flink? It's a bit specific but I didn't find an alternative, If so I'll remove the comment here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's some poking around the java object that could be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried multiple thinks, I wasn't able to land in anything better than this, I tried inspecting e.java_exception to get the getSQLState but I wasn't able. Maybe I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and changed this to catch the error, and then check for table existence instead of matching.

INVALID_INPUT is way too general of a string to match on in any case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was commenting on the druid one...this is the flink one.

@ncclementi ncclementi marked this pull request as ready for review July 31, 2024 21:29
@ncclementi
Copy link
Contributor Author

ncclementi commented Jul 31, 2024

I think this is good to get a first path of review. I left some comments on the cases where the exception caught is not great/pretty and we are doing some for of string matching. If anyone has better ideas for those, suggestions are more than welcome.

Notes/help needed:

  • Exasol and Impala don't run on mac arm64 which makes it harder to test/fix those.
  • How do we trigger bigquery and snowflake to run?
  • Not quite sure how to fix the error for the PySpark 3.3.3 job.
  • druid window function fail: before there was a mark notimpl that raised PyDruidProgrammingError but after the addition of TableNotFound that revealed that some were because the table wasn't found and other is raised here:
pydruid.db.exceptions.ProgrammingError: druidException (INVALID_INPUT): The query 
contains window functions; To run these window functions, specify [enableWindowing]
in query context. (line [1], column [256])

one solution is to add also PyDruidProgrammingError to the marker, like

pytestmark = [
    pytest.mark.notimpl(
        ["druid"], raises=(com.OperationNotDefinedError, com.TableNotFound, PyDruidProgrammingError)
    )
]

is this the right approach?

@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2024

The PySpark error class import needs an ImportError guard to handle the module path change from 3.3 to 3.5.

There are some examples around the codebase for this that were recently added.

@ncclementi ncclementi added the breaking change Changes that introduce an API break at any level label Aug 1, 2024
@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2024

Added a Snowflake implementation. It's ... something.

@ncclementi
Copy link
Contributor Author

I added this to the 10.0 milestone as it will have breaking changes for those using Ibis as a library.

@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2024

Ok, BigQuery implementation is up.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more missing from es

ibis/backends/pyspark/__init__.py Outdated Show resolved Hide resolved
ibis/backends/snowflake/__init__.py Outdated Show resolved Hide resolved
Comment on lines 527 to 528
if e.getErrorClass() == "TABLE_OR_VIEW_NOT_FOUND":
raise com.TableNotFound(table_name) from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if e.getErrorClass() == "TABLE_OR_VIEW_NOT_FOUND":
raise com.TableNotFound(table_name) from e
if e.getErrorClass() == "TABLE_OR_VIEW_NOT_FOUND":
raise com.TableNotFound(table_name) from e
raise

You still need to raise for any other AnalysisException that isn't handled by the error class check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this check doesn't seem to be working for PySpark 3.3.3, I'm looking into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit.

@@ -126,6 +126,21 @@ def _get_schema_using_query(self, query: str) -> sch.Schema:
schema[name] = dtype
return sch.Schema(schema)

def _table_exists(self, name: str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for adding this. It's way more clear.

@cpcloud
Copy link
Member

cpcloud commented Aug 1, 2024

I think this is good to go.

What an absolute hellscape this problem is.

Thanks for powering through @ncclementi!

@ncclementi
Copy link
Contributor Author

Thanks @cpcloud, great job you too! Thanks for the pairing and troubleshooting. When we are ready for 10.0 we can rebase and deal with conflicts if we have to.

@ncclementi
Copy link
Contributor Author

Hey y'all I tried to rebase this locally and did not go well 😞 so I'd need some help getting this one up to mark to be able to consolidate changes.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small things, but otherwise this LGTM.

ibis/backends/mysql/__init__.py Outdated Show resolved Hide resolved
ibis/backends/mysql/__init__.py Outdated Show resolved Hide resolved
@cpcloud cpcloud changed the title refactor: unify exception when TableNotFound refactor(table-api): unify exception type for all backends to TableNotFound when a table does not exist Sep 16, 2024
@cpcloud cpcloud merged commit 0c49e3b into ibis-project:main Sep 16, 2024
76 checks passed
ncclementi added a commit to ncclementi/ibis that referenced this pull request Sep 24, 2024
…otFound` when a table does not exist (ibis-project#9695)

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Gil Forsyth <gil@forsyth.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Consistent tablenotfound exception across backends
3 participants