Skip to content

Latest commit

 

History

History
167 lines (147 loc) · 8.61 KB

release-1.0.0.md

File metadata and controls

167 lines (147 loc) · 8.61 KB

.NET for Apache Spark 1.0.0 Release Notes

New Features/Improvements

  • DataFrame API Completeness for Spark 2.4.x (#621)(#598)(#623)(#635)(#642)(#665)(#698)
  • DataFrame API Completeness for Spark 3.0.x (#633)(#647)(#649)(#655)(#677)(#688)
  • Expose more Microsoft.Spark.ML.Feature classes (#574)(#586)(#608)(#652)(#703)
  • Support Microsoft.Spark.Sql.Types.ArrayType and Microsoft.Spark.Sql.Types.MapType (#670)(#689)
  • Ensure all calls from a given thread in the CLR is always executed in the same thread in the JVM (#641)
  • Expose Microsoft.Spark.Sql.Streaming.StreamingQueryManager (#670)(#690)
  • Broadcast encryption support (#489)
  • Support for Delta Lake 0.7.0 (#692)(#727)
  • Helper method that returns a DataFrame containing the Microsoft.Spark and Microsoft.Spark.Worker assembly version info. (#715)
  • Update version check logic on the Microsoft.Spark.Worker. (#718)
  • Update to Microsoft.Dotnet.Interactive 1.0.0-beta.20480.3 in Microsoft.Spark.Extensions.DotNet.Interactive (#694)
  • Override Microsoft.Spark.Sql.Column.ToString() to call Java's toString() (#624)
  • Refactor Microsoft.Spark.Sql.ArrowArrayHelpers (#636)(#646)

Bug Fixes

  • JvmBridge/Netty blocking connection deadlock mitigation (#714, #735)
  • Fix concurrent reading of broadcast variable file during deserialization (#612)
  • Return UTF-8 encoded string from JVM => CLR (#661)
  • Add JVM CallbackClient connection back to connection pool (#681)

Infrastructure / Documentation / Etc.

  • Update Delta Lake tests against Delta Lake 0.6.1 and Delta Lake 0.7.0/Spark 3.0 (#588)(#692)
  • Add more DataFrame examples (#599)
  • Update ubuntu instructions by prepending current directory in sample command (#603)
  • Delta Lake version annotations to Microsoft.Spark.Extensions.Delta (#632)
  • Hyperspace version annotations to Microsoft.Spark.Extensions.Hyperspace (#696)
  • Refactor unit tests (#638)
  • Add SimpleWorkerTests (#644)
  • Update to Arrow 0.15.1 (#653)
  • Remove Microsoft.Spark.Extensions.Azure.Synapse.Analytics from repo (#687)
  • Move Microsoft.Spark.Experimental to Microsoft.Spark (#691)
  • Fix DaemonWorkerTests.TestsDaemonWorkerTaskRunners and CallbackTests.TestCallbackServer tests (#699)
  • Improve build pipeline (#348)(#667)(#692)(#697)(#705)(#717)(#719)(#720)(#724)
  • microsoft-spark JAR renamed. (#293)(#728)(#729)

Breaking Changes

  • Prior versions (version < 1.0) of Microsoft.Spark and Microsoft.Spark.Worker are no longer compatible with 1.0. If you are planning to upgrade to 1.0, please check the migration guide.
  • microsoft-spark JAR name has changed. Please check the migration guide.

Known Issues

  • Broadcast variables do not work with dotnet-interactive (#561)
  • UDFs defined using class objects with closures does not work with dotnet-interactive (#619)
  • In dotnet-interactive blocking Spark methods that require external threads to unblock them does not work. ie StreamingQuery.AwaitTermination requires StreamingQuery.Stop to unblock (#736)
  • GroupedMap does not work on Spark 3.0.0 (#654)

Supported Spark Versions

The following table outlines the supported Spark versions along with the microsoft-spark JAR to use with:

Spark Version microsoft-spark JAR
2.3.* microsoft-spark-2-3_2.11-1.0.0.jar
2.4.0 microsoft-spark-2-4_2.11-1.0.0.jar
2.4.1
2.4.3
2.4.4
2.4.5
2.4.6
2.4.7
2.4.2 Not supported
3.0.0 microsoft-spark-3-0_2.12-1.0.0.jar
3.0.1

Supported Delta Versions

The following table outlines the supported Delta versions along with the Microsoft.Spark.Extensions version to use with:

Delta Version Microsoft.Spark.Extensions.Delta
0.1.0 1.0.0
0.2.0
0.3.0
0.4.0
0.5.0
0.6.0
0.6.1
0.7.0

Supported Hyperspace Versions

The following table outlines the supported Hyperspace versions along with the Microsoft.Spark.Extensions version to use with:

Hyperspace Version Microsoft.Spark.Extensions.Hyperspace
0.1.0 1.0.0
0.2.0