Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 3.8 KB

FORK.md

File metadata and controls

42 lines (35 loc) · 3.8 KB

Difference with upstream

  • SPARK-21195 - Automatically register new metrics from sources and wire default registry

  • SPARK-20952 - ParquetFileFormat should forward TaskContext to its forkjoinpool

  • SPARK-20001 (SPARK-13587) - Support PythonRunner executing inside a Conda env (and R)

  • SPARK-17059 - Allow FileFormat to specify partition pruning strategy via splits

  • SPARK-24345 - Improve ParseError stop location when offending symbol is a token

  • SPARK-23795 - Make AbstractLauncher#self() protected

  • SPARK-18079 - CollectLimitExec.executeToIterator should perform per-partition limits

  • SPARK-15777 (Partial fix) - Catalog federation

    • make ExternalCatalog configurable beyond in memory and hive
    • FileIndex for catalog tables is provided by external catalog instead of using default impl
  • Better pushdown for IN expressions in parquet via UserDefinedPredicate (SPARK-17091 for original issue)

  • SafeLogging implemented for the following files:

    • core: Broadcast, CoarseGrainedExecutorBackend, CoarseGrainedSchedulerBackend, Executor, MemoryStore, SparkContext, TorrentBroadcast
    • kubernetes: ExecutorPodsAllocator, ExecutorPodsLifecycleManager, ExecutorPodsPollingSnapshotSource, ExecutorPodsSnapshot, ExecutorPodsWatchSnapshotSource, KubernetesClusterSchedulerBackend
    • yarn: YarnClusterSchedulerBackend, YarnSchedulerBackend
  • SPARK-26626 - Limited the maximum size of repeatedly substituted aliases

  • SPARK-25299 - Adds the complete plugin tree for shuffle byte storage

Added

  • Gradle plugin to easily create custom docker images for use with k8s
  • Filter rLibDir by exists so that daemon.R references the correct file (#460)
  • Implementation of the shuffle I/O plugins from SPARK-25299 that asynchronously backs up shuffle files to remote storage
  • Add pre-installed conda configuration and use to find rlib directory (#700)
  • Supports Arrow-serialization of Python 2 strings (#678)

Reverted

  • SPARK-25908 - Removal of monotonicall_increasing_id, toDegree, toRadians, approxCountDistinct, unionAll
  • SPARK-25862 - Removal of unboundedPreceding, unboundedFollowing, currentRow
  • SPARK-26127 - Removal of deprecated setters from tree regression and classification models
  • SPARK-25867 - Removal of KMeans computeCost
  • SPARK-26216 - Change to UserDefinedFunction type
    • SPARK-26323 - Scala UDF null checking
    • SPARK-26580 - Bring back scala 2.11 behaviour of primitive types null behaviour
  • SPARK-26133 - Old OneHotEncoder
  • SPARK-11215 - StringIndexer multi column support
  • SPARK-26616 - No document frequency in IDFModel