Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparators error when using pyspark #225

Closed
hbashary opened this issue Jan 21, 2024 · 6 comments · Fixed by #226
Closed

Comparators error when using pyspark #225

hbashary opened this issue Jan 21, 2024 · 6 comments · Fixed by #226

Comments

@hbashary
Copy link

hbashary commented Jan 21, 2024

Trying to run example in documentation using pyspark but keep getting the following error -
AttributeError: 'DiffOptions' object has no attribute 'withComparator' .

Running this in a Glue notebook with Spark version 3.3 and spark-extension_2.12-2.8.0. Same issue when upgrading to spark-extension_2.13-2.11.0. Is this method supported for the python api?

Create 2 dataframes

df_1 = spark.createDataFrame([
    Row(id=1, value=1.0),
    Row(id=2, value=2.0),
    Row(id=3, value=3.0),
])


df_2 = spark.createDataFrame([
    Row(id=1, value=1.0),
    Row(id=2, value=2.02),
    Row(id=3, value=3.05),
])

Run Comparator method

from pyspark.sql.types import DoubleType
from gresearch.spark.diff import DiffOptions, DiffMode, DiffComparators

options = DiffOptions().with_change_column("changes")\
                       .withComparator(DiffComparators.epsilon(0.01).asRelative().asInclusive(), DoubleType)

df_1.diff_with_options(df_2, options, "id").show()

Error - AttributeError: 'DiffOptions' object has no attribute 'withComparator'

@EnricoMi
Copy link
Contributor

You are right, that Python example code in DIFF.md was wrong, it should read with_data_type_comparator(...).

Please modify your code as follows:

-.withComparator(DiffComparators.epsilon(0.01).asRelative().asInclusive(), DoubleType)
+.with_data_type_comparator(DiffComparators.epsilon(0.01).as_relative().as_inclusive(), DoubleType())

I have fixed the DIFF.md.

@hbashary
Copy link
Author

hbashary commented Jan 22, 2024

Thanks for the quick response. One last question - the map attribute doesn't seem to be supported for python.

options = DiffOptions().with_change_column("changes")\\
                                       .with_data_type_comparator(DiffComparators.map[K,V](false))

Error - AttributeError: type object 'DiffComparators' has no attribute 'map'

@EnricoMi
Copy link
Contributor

Right, the Python API does not support the Map comparator. I haven't yet figured out how to get the key and value types K and V from Python to Scala.

@hbashary
Copy link
Author

Thanks Enrico.

@EnricoMi
Copy link
Contributor

EnricoMi commented Jan 24, 2024

I have found a way to provide the MapDiffComparator to Python API: #226

That fix allows for DiffComparators.map(Integer(), LongType()) in Python.

@EnricoMi
Copy link
Contributor

This has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants