Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks 12.2 Support [databricks] #8282

Merged
merged 75 commits into from
Jun 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
906e5cd
prep for 332db
andygrove May 12, 2023
ebcafd0
FileIndexOptionsShims for 332db
andygrove May 12, 2023
c69a884
GpuOptimizedCreateHiveTableAsSelectCommandShims for 332db
andygrove May 12, 2023
0ad5112
Move GpuInsertIntoHiveTableMeta shim to 332db
andygrove May 12, 2023
bf07992
Move HiveProviderCmdShims to 332db
andygrove May 12, 2023
8a4b6a0
Move SchemaUtilsShims to 332db
andygrove May 12, 2023
ba4f280
Move GpuBatchScanExec to 332db
andygrove May 12, 2023
8f66e5d
Move BatchScanExecMeta to 332db
andygrove May 12, 2023
69fc628
Move SparkShims to 332db
andygrove May 12, 2023
d4b49f4
Revert "Move SparkShims to 332db"
andygrove May 12, 2023
86bf257
Move SparkDateTimeExceptionShims and SparkDateTimeExceptionShims to 3…
andygrove May 12, 2023
d0592cc
Merge remote-tracking branch 'nvidia/branch-23.06' into 332db
andygrove May 12, 2023
daff5e8
add shim for creating ExecutedWriteSummary
andygrove May 12, 2023
9560632
add 332db shim tag for ParquetTimestampAnnotationShims
andygrove May 12, 2023
80cff17
revert changes to BatchScanExec
andygrove May 12, 2023
4f6c994
Revert "revert changes to BatchScanExec"
andygrove May 12, 2023
3007bda
revert changes to BatchScanExec
andygrove May 12, 2023
d318dd7
fix a compilation error in GpuBatchScanExec
andygrove May 13, 2023
46f700a
try fix CTAS support
andygrove May 13, 2023
105e99b
fix
andygrove May 13, 2023
a649b21
CTAS
andygrove May 13, 2023
95a00dc
CTAS
andygrove May 13, 2023
6d9cde0
GpuDataSource
andygrove May 13, 2023
b369058
Save progress
andygrove May 15, 2023
fe590ba
fix another compilation issue
andygrove May 15, 2023
d96826b
Add DeltaLogShim
andygrove May 16, 2023
f307daf
add 332db profile to aggregator
andygrove May 16, 2023
bcffcea
trying to get integration tests running
andygrove May 16, 2023
85ee74e
add SparkShimServiceProvider for 332db
andygrove May 16, 2023
2c28e48
fix package name
andygrove May 16, 2023
029f5fa
Merge remote-tracking branch 'nvidia/branch-23.06' into 332db
andygrove May 16, 2023
d81de34
scalastyle
andygrove May 16, 2023
a0ff02d
fix 340 shim for GpuFileFormatDataWriter
andygrove May 16, 2023
5128208
fix 333 shim for GpuFileFormatDataWriter
andygrove May 16, 2023
a0726d7
add 332db tag to ReaderUtils
andygrove May 17, 2023
17e09d8
fix shim issue resulting from upmerge
andygrove May 17, 2023
9571f53
fix shim issue resulting from upmerge
andygrove May 17, 2023
78e73cb
shim GpuWriteFiles for 332db
andygrove May 17, 2023
01d90df
shim GpuWriteFiles for 332db
andygrove May 17, 2023
9957cf7
fix regressions in hive_write_test and majority of regressions in par…
andygrove May 17, 2023
e2f2a4e
signoff
andygrove May 17, 2023
c3a4426
revert change to 330db shim
andygrove May 17, 2023
abe8662
fix duplicate 332db tags for GpuFileFormatWriter
andygrove May 17, 2023
8068205
Support KnownNullable in 332db
andygrove May 17, 2023
310252c
Support Empty2Null in 332db
andygrove May 18, 2023
a9009cd
skip test_hive_empty_generic_udf on 322db, add link to issue
andygrove May 18, 2023
4cb1c5a
skip test_parquet_read_nano_as_longs_true on 332db
andygrove May 18, 2023
1621abb
reinstate call to assertRemovable
andygrove May 18, 2023
4c4548a
update 332db copy of GpuOptimisticTransaction to match 330db copy whi…
andygrove May 18, 2023
f794861
revert intellij auto formatting
andygrove May 18, 2023
7c1e175
Revert a formatting change
andygrove May 18, 2023
3eaa819
update copyright year
andygrove May 23, 2023
958d5ff
Merge remote-tracking branch 'nvidia/branch-23.06' into 332db
andygrove May 24, 2023
4696951
fix test failures in delta_lake_write_test.py by adding DataWritingCo…
andygrove May 24, 2023
203581f
Fall back to CPU for MergeInto commands if notMatchedBySourceClauses …
andygrove May 26, 2023
a1d5aee
Revert "Fall back to CPU for MergeInto commands if notMatchedBySource…
andygrove May 26, 2023
279cddb
Fall back to CPU for MergeInto commands if notMatchedBySourceClauses …
andygrove May 26, 2023
111e1d8
skip failing delta lake integration tests and link to follow-on issue
andygrove May 26, 2023
b82a319
scalastyle
andygrove May 26, 2023
18d905f
Add fallback test for WHEN NOT MATCHED BY SOURCE
andygrove May 30, 2023
0b4e489
generated configs.md
andygrove May 30, 2023
6592573
fix merge conflict
andygrove May 31, 2023
6d5d11b
rename shim classes to avoid conflict
andygrove May 31, 2023
82087c4
Update delta-lake/delta-spark332db/src/main/scala/com/nvidia/spark/ra…
andygrove May 31, 2023
b97ce29
Update delta-lake/delta-spark332db/src/main/scala/com/databricks/sql/…
andygrove May 31, 2023
6df64ce
Update sql-plugin/src/main/spark332db/scala/com/nvidia/spark/rapids/s…
andygrove May 31, 2023
1e4993c
Merge remote-tracking branch 'nvidia/branch-23.08' into 332db
andygrove May 31, 2023
8d9c085
Fix compressed Hive text read on Databricks.
mythrocks May 31, 2023
50552c8
Move some shim code into new MergeIntoCommandMeta
andygrove May 31, 2023
a9b8ed7
Merge remote-tracking branch 'mythrocks/fix-hive-compressed-read-data…
andygrove May 31, 2023
dbccf6e
fix compilation error
andygrove Jun 1, 2023
39f4020
fix 332db shims for RapidsShuffleManager
andygrove Jun 1, 2023
6a3910f
Merge remote-tracking branch 'nvidia/branch-23.08' into 332db
andygrove Jun 1, 2023
bdc3b74
enable test_optimized_hive_bucketed_fallback for databricks 12.2
andygrove Jun 1, 2023
b6d2045
enable test_int96_write_conf_with_write_exec for databricks 12.2
andygrove Jun 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,23 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>release332db</id>
<activation>
<property>
<name>buildver</name>
<value>332db</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-delta-spark332db_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
</dependency>
</dependencies>
</profile>
<profile>
<id>release333</id>
<activation>
Expand Down
1 change: 1 addition & 0 deletions delta-lake/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and directory contains the corresponding support code.
| 2.2.x | Spark 3.3.x | `delta-22x` |
| Databricks 10.4 | Databricks 10.4 | `delta-spark321db` |
| Databricks 11.3 | Databricks 11.3 | `delta-spark330db` |
| Databricks 12.2 | Databricks 12.2 | `delta-spark332db` |

Delta Lake is not supported on all Spark versions, and for Spark versions where it is not
supported the `delta-stub` project is used.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package com.nvidia.spark.rapids.delta
import com.databricks.sql.transaction.tahoe.commands.{MergeIntoCommand, MergeIntoCommandEdge}
import com.databricks.sql.transaction.tahoe.rapids.{GpuDeltaLog, GpuMergeIntoCommand}
import com.nvidia.spark.rapids.{DataFromReplacementRule, RapidsConf, RapidsMeta, RunnableCommandMeta}
import com.nvidia.spark.rapids.delta.shims.MergeIntoCommandMetaShim

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.command.RunnableCommand
Expand All @@ -35,6 +36,7 @@ class MergeIntoCommandMeta(
willNotWorkOnGpu("Delta Lake output acceleration has been disabled. To enable set " +
s"${RapidsConf.ENABLE_DELTA_WRITE} to true")
}
MergeIntoCommandMetaShim.tagForGpu(this, mergeCmd)
val targetSchema = mergeCmd.migratedSchema.getOrElse(mergeCmd.target.schema)
val deltaLog = mergeCmd.targetFileIndex.deltaLog
RapidsDeltaUtils.tagForDeltaWrite(this, targetSchema, deltaLog, Map.empty, SparkSession.active)
Expand Down Expand Up @@ -64,6 +66,7 @@ class MergeIntoCommandEdgeMeta(
willNotWorkOnGpu("Delta Lake output acceleration has been disabled. To enable set " +
s"${RapidsConf.ENABLE_DELTA_WRITE} to true")
}
MergeIntoCommandMetaShim.tagForGpu(this, mergeCmd)
val targetSchema = mergeCmd.migratedSchema.getOrElse(mergeCmd.target.schema)
val deltaLog = mergeCmd.targetFileIndex.deltaLog
RapidsDeltaUtils.tagForDeltaWrite(this, targetSchema, deltaLog, Map.empty, SparkSession.active)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package com.nvidia.spark.rapids.delta

import com.databricks.sql.transaction.tahoe.{DeltaConfigs, DeltaLog, DeltaOptions, DeltaParquetFileFormat}
import com.nvidia.spark.rapids.{DeltaFormatType, FileFormatChecks, GpuOverrides, GpuParquetFileFormat, RapidsMeta, TypeSig, WriteFileOp}
import com.nvidia.spark.rapids.delta.shims.DeltaLogShim

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.datasources.DataSourceUtils
Expand All @@ -32,7 +33,7 @@ object RapidsDeltaUtils {
options: Map[String, String],
spark: SparkSession): Unit = {
FileFormatChecks.tag(meta, schema, DeltaFormatType, WriteFileOp)
deltaLog.fileFormat() match {
DeltaLogShim.fileFormat(deltaLog) match {
case _: DeltaParquetFileFormat =>
GpuParquetFileFormat.tagGpuSupport(meta, spark, options, schema)
case f =>
Expand Down Expand Up @@ -65,7 +66,7 @@ object RapidsDeltaUtils {
orderableTypeSig.isSupportedByPlugin(t)
}
if (unorderableTypes.nonEmpty) {
val metadata = deltaLog.snapshot.metadata
val metadata = DeltaLogShim.getMetadata(deltaLog)
val hasPartitioning = metadata.partitionColumns.nonEmpty ||
options.get(DataSourceUtils.PARTITIONING_COLUMNS_KEY).exists(_.nonEmpty)
if (!hasPartitioning) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.nvidia.spark.rapids.delta.shims

import com.databricks.sql.transaction.tahoe.DeltaLog
import com.databricks.sql.transaction.tahoe.actions.Metadata

import org.apache.spark.sql.execution.datasources.FileFormat

object DeltaLogShim {
def fileFormat(deltaLog: DeltaLog): FileFormat = {
deltaLog.fileFormat()
}

def getMetadata(deltaLog: DeltaLog): Metadata = {
deltaLog.snapshot.metadata
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.nvidia.spark.rapids.delta.shims

import com.databricks.sql.transaction.tahoe.commands.{MergeIntoCommand, MergeIntoCommandEdge}
import com.nvidia.spark.rapids.delta.{MergeIntoCommandEdgeMeta, MergeIntoCommandMeta}

object MergeIntoCommandMetaShim {
def tagForGpu(meta: MergeIntoCommandMeta, mergeCmd: MergeIntoCommand): Unit = {}
def tagForGpu(meta: MergeIntoCommandEdgeMeta, mergeCmd: MergeIntoCommandEdge): Unit = {}
}
Loading