Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-snowflake connector is not ignoring the double quotes in the column name even after setting QUOTED_IDENTIFIERS_IGNORE_CASE to true #509

Open
chaurasiya opened this issue Apr 27, 2023 · 3 comments

Comments

@chaurasiya
Copy link

chaurasiya commented Apr 27, 2023

Hi,
I'm using spark-snowflake connector client library in my spark application. And there is a Snowflake server which has case sensitive attribute names. In the snowflake console, when I execute this ALTER SESSION SET QUOTED_IDENTIFIERS_IGNORE_CASE = TRUE; I don't need to provide double quotes ("") while executing the SELECT query. Like SELECT c1 FROM schema.table works perfectly. I wanted the similar behaviour in my spark application and followed this documentation to set the above session parameter.

Versions:
spark: 3.3.1
scala: 2.12.17
spark-snowflake: 2.11.1-spark_3.3
snowflake-jdbc: 3.13.29

Here is the code I used and the output. Dataframe columns are still coming in double quotes. Any help on this please ?

 val sfOptions = Map(
      "sfUrl" -> "<account_identifier>.azure.snowflakecomputing.com",
      "sfAccount" -> "<account_identifier>",
      "sfDatabase" -> "db",
      "sfWarehouse" -> "wh",
      "sfRole" -> "role",
      "pem_private_key" -> privateKey,
      "sfUser" -> "user@xx.com",
      "sfPassword" -> "dummy",
      "sfSchema" -> "schema",
      "autopushdown" -> "on",
      "QUOTED_IDENTIFIERS_IGNORE_CASE" -> "true"
    )
    val df = spark.read
      .format(SNOWFLAKE_SOURCE_NAME)
      .options(sfOptions)
      .option("query", "SELECT * FROM schema.table")
      .load()
    df.printSchema()

...
root
 |-- "c1": string (nullable = true)
 |-- "c2": timestamp (nullable = true)
 |-- "c3": decimal(38,0) (nullable = true)
 ...

@sfc-gh-mrui

@FabFlying
Copy link

Same issue
spark: 3.3.2
scala: 2.12.17
spark-snowflake: 2.12.0-spark_3.3
snowflake-jdbc: 3.13.30

+----------+--------+---------+
| "Date"|"GroupA"|"MetricA"|
+----------+--------+---------+
|2001-01-01| B| 2.0|

Column name in the Spark DataFrame has double quotes as in ["Date"] where it should be [Date] with no quotes around.
Direct JDBC connection works fine.

@chaurasiya
Copy link
Author

@sfc-gh-azhan @sfc-gh-abhatnagar @sfc-gh-bli Any help on this please

@10-varunshah
Copy link

10-varunshah commented Jul 15, 2023

Hi @chaurasiya ,

the option to be used is keep_column_case = on like this:

 val sfOptions = Map(
      "sfUrl" -> "<account_identifier>.azure.snowflakecomputing.com",
      "sfAccount" -> "<account_identifier>",
      "sfDatabase" -> "db",
      "sfWarehouse" -> "wh",
      "sfRole" -> "role",
      "pem_private_key" -> privateKey,
      "sfUser" -> "user@xx.com",
      "sfPassword" -> "dummy",
      "sfSchema" -> "schema",
      "autopushdown" -> "on",
      "keep_column_case" -> "on"
    )
    val df = spark.read
      .format(SNOWFLAKE_SOURCE_NAME)
      .options(sfOptions)
      .option("query", "SELECT * FROM schema.table")
      .load()
    df.printSchema()

...
root
 |-- c1: string (nullable = true)
 |-- c2: timestamp (nullable = true)
 |-- c3: decimal(38,0) (nullable = true)
 ...

You can refer to the explanation of the property from official docs. Note that even though the explanation is mentioned only for write dataframe operation, to snowflake from spark, this works for reading data as well.

I validated through databricks 13.2 version (spark 3.4.0), so hopefully you can give it a try in your local spark application as well and let me know if you see any issues.

Also, USE columnmap if writing data to snowflake to avoid such issues, for read, this property is ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants