Skip to content

Commit

Permalink
Issue #1436: Fix restore delta table NotSerializableException for Had…
Browse files Browse the repository at this point in the history
…oop 2

When execute `restore` command on delta table by spark sql with Hadoop 2,  it reported `java.io.NotSerializableException: org.apache.hadoop.fs.Path`.

The issue is only in Hadoop 2 because [Path is serializable in Hadoop 3](https://issues.apache.org/jira/browse/HADOOP-13519).

## Description

Resolves #1436

Package new version of delta-core jar and put it under $SPARK_HOME/jars directory. Launch spark-sql and execute `restore table xxx TO VERSION AS OF xx` command on existed delta table, it executed successfully. Then execute `DESCRIBE HISTORY xxx` command on the delta table, it show `RESTORE` operation at the last commit.

spark-sql (default)> restore table default.people10m TO VERSION AS OF 4;
table_size_after_restore  num_of_files_after_restore  num_removed_files num_restored_files  removed_files_size  restored_files_size
1808  4 5 4 2260  1808
Time taken: 22.38 seconds, Fetched 1 row(s)

spark-sql (default)> DESCRIBE HISTORY default.people10m;
version timestamp userId  userName  operation operationParameters job notebook  clusterIreadVersion isolationLevel  isBlindAppend operationMetrics  userMetadata  engineInfo
7 2022-10-18 10:23:33.325 NULL  NULL  RESTORE {"timestamp":null,"version":"4"}  NULL  NULL  NULL  Serializable  false {"numOfFilesAfterRestore":"4","numRemovedFiles":"5","numRestoredFiles":"4","removedFilesSize":"2260","restoredFilesSize":"1808","tableSizeAfterRestore":"1808"} NULL  Apache-Spark/3.3.0 Delta-Lake/2.1.0-SNAPSHOT

## Does this PR introduce _any_ user-facing changes?

No

Closes #1440

Signed-off-by: Scott Sandre <scott.sandre@databricks.com>
GitOrigin-RevId: e4e453bc07ee43e893d7356c8c5f45c7dd5ebe14
  • Loading branch information
ChenShuai1981 authored and zsxwing committed Oct 19, 2022
1 parent ac13fcb commit 23751f8
Showing 1 changed file with 4 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import org.apache.spark.sql.delta.actions.{AddFile, RemoveFile}
import org.apache.spark.sql.delta.catalog.DeltaTableV2
import org.apache.spark.sql.delta.sources.DeltaSQLConf
import org.apache.spark.sql.delta.util.DeltaFileOperations.absolutePath
import org.apache.hadoop.fs.Path

import org.apache.spark.sql.{Dataset, Row, SparkSession}
import org.apache.spark.sql.catalyst.TableIdentifier
Expand Down Expand Up @@ -244,17 +245,17 @@ case class RestoreTableCommand(

val spark: SparkSession = files.sparkSession

val path = deltaLog.dataPath
val pathString = deltaLog.dataPath.toString
val hadoopConf = spark.sparkContext.broadcast(
new SerializableConfiguration(deltaLog.newDeltaHadoopConf()))

import org.apache.spark.sql.delta.implicits._

val missedFiles = files
.mapPartitions { files =>
val path = new Path(pathString)
val fs = path.getFileSystem(hadoopConf.value.value)
val pathStr = path.toUri.getPath
files.filterNot(f => fs.exists(absolutePath(pathStr, f.path)))
files.filterNot(f => fs.exists(absolutePath(pathString, f.path)))
}
.map(_.path)
.head(100)
Expand Down

0 comments on commit 23751f8

Please sign in to comment.