You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Currently we are not able to re use a spark session when using a runtime data source
To Reproduce
I've looked into this fix in my current setup, unfortunately if I changed my config from Datasource to SparkDFDatasource I get the following issue return datasource.get_batch_list_from_batch_request( AttributeError: 'SparkDFDatasource' object has no attribute 'get_batch_list_from_batch_request'
Next if I change it back to Datasource with the fix I get the following error datasource: Datasource = cast(Datasource, self.datasources[datasource_name]) KeyError: 'my_spark_datasource' this is do to the fact that the Datasource class when instantiated doesn't know what to do with the force_reuse_spark_context flag and the error get's hidden (this needs to be fix) and the my_spark_datasource is never instantiated causing it to throw this KeyError exception
Here a reference of what my data_source config looks like { "my_spark_datasource": { "class_name": "Datasource", "force_reuse_spark_context": True, "execution_engine": { "class_name": "SparkDFExecutionEngine" }, "data_connectors": { "my_runtime_data_connector": { "module_name": "great_expectations.datasource.data_connector", "class_name": "RuntimeDataConnector", "batch_identifiers": [ "some_key" ] } } } }
I think the solution is to not only pass force_reuse_spark_context to the SparkDFDatasource but also pass it to SparkDFExecutionEngine I was able to get a working solution by adding the following to ExecutionEngineSchema
Expected behavior
Just the ability to use SparkEngine with spark session re use flag
Environment (please complete the following information):
Operating System: Linux
Great Expectations Version: 0.13.23
Additional context
I pretty much laid out what needs to be fixed, unfortunately I have prior commitments and can't do the work myself.
I also think the solution proposed here will help as well. #3126
The text was updated successfully, but these errors were encountered:
Describe the bug
Currently we are not able to re use a spark session when using a runtime data source
To Reproduce
I've looked into this fix in my current setup, unfortunately if I changed my config from
Datasource
toSparkDFDatasource
I get the following issuereturn datasource.get_batch_list_from_batch_request( AttributeError: 'SparkDFDatasource' object has no attribute 'get_batch_list_from_batch_request'
Next if I change it back to
Datasource
with the fix I get the following errordatasource: Datasource = cast(Datasource, self.datasources[datasource_name]) KeyError: 'my_spark_datasource'
this is do to the fact that theDatasource
class when instantiated doesn't know what to do with theforce_reuse_spark_context
flag and the error get's hidden (this needs to be fix) and themy_spark_datasource
is never instantiated causing it to throw this KeyError exceptionHere a reference of what my
data_source
config looks like{ "my_spark_datasource": { "class_name": "Datasource", "force_reuse_spark_context": True, "execution_engine": { "class_name": "SparkDFExecutionEngine" }, "data_connectors": { "my_runtime_data_connector": { "module_name": "great_expectations.datasource.data_connector", "class_name": "RuntimeDataConnector", "batch_identifiers": [ "some_key" ] } } } }
In this case I want a runtime batch following the directions laid out here -> https://discuss.greatexpectations.io/t/how-to-validate-spark-dataframes-in-0-13/582
I think the solution is to not only pass
force_reuse_spark_context
to theSparkDFDatasource
but also pass it toSparkDFExecutionEngine
I was able to get a working solution by adding the following to ExecutionEngineSchemaThis is what my
data_source config
looks likeExpected behavior
Just the ability to use SparkEngine with spark session re use flag
Environment (please complete the following information):
Additional context
I pretty much laid out what needs to be fixed, unfortunately I have prior commitments and can't do the work myself.
I also think the solution proposed here will help as well.
#3126
The text was updated successfully, but these errors were encountered: