Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can not ansi cast decimal type to long type while fetching decimal column from data table #6128

Closed
razajafri opened this issue Jul 27, 2022 · 1 comment · Fixed by #6103
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

razajafri commented Jul 27, 2022

Describe the bug
Cudf Exception thrown when trying to cast decimal to long

Steps/Code to reproduce bug

spark.conf.set("spark.sql.ansi.enabled", true)
val df = Seq(222.22, 777.77).toDF("f")
val df2 = df.repartition(1).selectExpr("cast(f as decimal(5,2))")
val df3 = df2.repartition(1).selectExpr("cast(f as long)")
df3.show
Caused by: ai.rapids.cudf.CudfException: cuDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-29-cuda11/thirdparty/cudf/cpp/src/binaryop/binaryop.cpp:205: Unsupported operator for these types
  at ai.rapids.cudf.ColumnView.binaryOpVS(Native Method)
  at ai.rapids.cudf.ColumnView.binaryOp(ColumnView.java:1254)
  at ai.rapids.cudf.ColumnView.binaryOp(ColumnView.java:1244)
  at ai.rapids.cudf.BinaryOperable.lessThan(BinaryOperable.java:277)
  at ai.rapids.cudf.BinaryOperable.lessThan(BinaryOperable.java:284)
  at com.nvidia.spark.rapids.GpuCast$.$anonfun$assertValuesInRange$3(GpuCast.scala:641)

Expected behavior
The script should not throw an exception

@razajafri razajafri added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 27, 2022
@razajafri razajafri self-assigned this Jul 27, 2022
@gerashegalov
Copy link
Collaborator

gerashegalov commented Jul 28, 2022

Is there also a bug with timezone awareness? This repro falls back to CPU in a non-UTC timezone.

TZ=UTC+1 ~/dist/spark-3.2.1-bin-hadoop3.2/bin/spark-shell --conf spark.jars=$HOME/gits/NVIDIA/spark-rapids/dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar --conf spark.plugins=com.nvidia.spark.SQLPlugin

scala> spark.conf.set("spark.sql.ansi.enabled", true)

scala> val df = Seq(222.22, 777.77).toDF("f")
df: org.apache.spark.sql.DataFrame = [f: double]

scala> val df2 = df.repartition(1).selectExpr("cast(f as decimal(5,2))")
df2: org.apache.spark.sql.DataFrame = [f: decimal(5,2)]

scala> val df3 = df2.repartition(1).selectExpr("cast(f as long)")
df3: org.apache.spark.sql.DataFrame = [f: bigint]

scala> df3.show
22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
      @Expression <Alias> ansi_cast(f#4 as decimal(5,2)) AS f#6 could run on GPU
        !Expression <Cast> ansi_cast(f#4 as decimal(5,2)) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
          @Expression <AttributeReference> f#4 could run on GPU
      !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
        @Partitioning <SinglePartition$> could run on GPU
        ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
          @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
      @Expression <Alias> ansi_cast(f#4 as decimal(5,2)) AS f#6 could run on GPU
        !Expression <Cast> ansi_cast(f#4 as decimal(5,2)) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
          @Expression <AttributeReference> f#4 could run on GPU
      !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
        @Partitioning <SinglePartition$> could run on GPU
        ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
          @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
      @Expression <Alias> ansi_cast(f#4 as decimal(5,2)) AS f#6 could run on GPU
        !Expression <Cast> ansi_cast(f#4 as decimal(5,2)) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
          @Expression <AttributeReference> f#4 could run on GPU
      !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
        @Partitioning <SinglePartition$> could run on GPU
        ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
          @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
  @Partitioning <SinglePartition$> could run on GPU
  ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
    @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU
  !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
    @Partitioning <SinglePartition$> could run on GPU
    !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
      @Expression <Alias> ansi_cast(f#4 as decimal(5,2)) AS f#6 could run on GPU
        !Expression <Cast> ansi_cast(f#4 as decimal(5,2)) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
          @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:07 WARN GpuOverrides: 
!Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
  @Partitioning <SinglePartition$> could run on GPU
  !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
    @Expression <Alias> ansi_cast(f#4 as decimal(5,2)) AS f#6 could run on GPU
      !Expression <Cast> ansi_cast(f#4 as decimal(5,2)) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#4 could run on GPU

22/07/27 23:12:08 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU

22/07/27 23:12:08 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> ansi_cast(ansi_cast(f#6 as bigint) as string) AS f#11 could run on GPU
    !Expression <Cast> ansi_cast(ansi_cast(f#6 as bigint) as string) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
      !Expression <Cast> ansi_cast(f#6 as bigint) cannot run on GPU because Only UTC zone id is supported. Actual default zone id: GMT-01:00; Only UTC zone id is supported. Actual session local zone id: GMT-01:00
        @Expression <AttributeReference> f#6 could run on GPU

+---+
|  f|
+---+
|222|
|777|
+---+

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants