-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
great_expectations.exceptions.exceptions.MetricResolutionError: 'NoneType' object has no attribute 'setCallSite' #3622
Comments
UPDATE: if you run my_validator.head() it fails with the following:
|
Hi @oscaratnc so now when I add "force_reuse_spark_context: true" to my execution_engine config everything works fine: Hope it helps. |
Thanks for opening this issue, @oscaratnc, and thanks for your response, @artemsachuk! Please let us know whether this resolves the issue. Otherwise we will review internally and be in touch soon. |
Hello, @artemsachuk Thanks for the direction, I made the change in the base.py file class ExecutionEngine but still I am getting the same result.
This is the expectation I am trying to run : a this is the result:
There's again the CallSite Thing, Do you notices something wrong with my configuration? |
Hi @oscaratnc, it appears that you are not passing your Spark config into your ExecutionEngine. This is not a direct example, as it applies to V2, but you can take a look at this description to get a sense of how to do this. |
WOW you're the best! It's working Now Thanks a lot! I think we can close this issue :) |
Great - thanks for letting us know! |
Describe the bug
When trying to add an expectation into a expectation suite with a DF coming from a SparkDFExecutionEngine datasource it fails and throws the error in the title of this issue specifically when calling res = df.agg(*aggregate_cols).collect() in the resolve_metric_bundle function within sparkdf_execution_engine.py.
This happens when using the databrick-connector setup for working with the databricks cluster from a pycharm instance and a GE context created, but it doesn't fail if the context is not created, tried running a .head() function that includes a .collect() in it having the same result, with and without creating a GE context.
To Reproduce
Steps to reproduce the behavior:
When running the expectation logic it shows the following error
I went into the code steps and traced it back to a df.collect() function in the def resolve_metric_bundle function within sparkdf_execution_engine.py.
This sounded weird to me and I tried a df.collect() function directly before running the expectation logic, and it resulted in the same line of failing, BUT, when I tried it without running the context creation it worked perfectly so I wonder if GE generates it’s own spark session or how can I be able to use this kind of connections (pycharm-Databricks) from the GE perspective? Do you have an example of something like this?
Expected behavior
Everything runs smoothly so the result for the expectations are shown adn we can save the expectation suite.
Environment (please complete the following information):
Additional context
When a collect is ran independently there are two scenarios:
-GE Data Context created: it fails df.head() [includes a collect()] ..FAIL
-No GE Data Context: SUCCESS df.head() [includes a collect()]
Found this but I don't know how applicable it is.... https://issues.apache.org/jira/browse/SPARK-27335?jql=text%20~%20%22setcallsite%22
The text was updated successfully, but these errors were encountered: