-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch for SPARK-942 #50
Conversation
…the serializer when a 'DISK_ONLY' persist is called. This is in response to SPARK-942.
…ffer objects. This was previously done higher up the stack.
Conflicts: core/src/main/scala/org/apache/spark/CacheManager.scala
…ood data in the LargeIteratorSuite
… system variable 'spark.serializer.objectStreamReset', default is not 10000.
…Buffer (rather then an Iterator). This will allow BlockStores to have slightly different behaviors dependent on whether they get an Iterator or ArrayBuffer. In the case of the MemoryStore, it needs to duplicate and cache an Iterator into an ArrayBuffer, but if handed a ArrayBuffer, it can skip the duplication.
…5 seconds. Confirmed that it still crashes an unpatched copy of Spark.
…rs. It doesn't try to invoke a OOM error any more
…. Now using trait 'Values'. Also modified BlockStore.putBytes call to return PutResult, so that it behaves like putValues.
…k into iterator-to-disk Conflicts: core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala
Merged build triggered. |
Merged build started. |
Merged build triggered. |
All automated tests passed. |
I think I've covered all the formatting requests. Any other issues? |
Thanks @kellrott for this patch - sorry it took us a long time to review it. I'm going to merge this now. |
I've created SPARK-1201 (https://spark-project.atlassian.net/browse/SPARK-1201) to cover optimizations in cases other than DISK_ONLY. |
Fix race condition in SparkListenerSuite (fixes SPARK-908). (cherry picked from commit 215238c) Signed-off-by: Reynold Xin <rxin@apache.org>
## What changes were proposed in this pull request? In Databricks, `SPARK_DIST_CLASSPATH` are used for driver classpath and `SPARK_JARS_DIR` is empty. So, we need to add `SPARK_DIST_CLASSPATH` to the `LAUNCH_CLASSPATH`. We cannot remove `SPARK_JARS_DIR` because Spark unit tests are actually using it. Author: Yin Huai <yhuai@databricks.com> Closes apache#50 from yhuai/Add-SPARK_DIST_CLASSPATH-toLAUNCH_CLASSPATH.
## What changes were proposed in this pull request? In Databricks, `SPARK_DIST_CLASSPATH` are used for driver classpath and `SPARK_JARS_DIR` is empty. So, we need to add `SPARK_DIST_CLASSPATH` to the `LAUNCH_CLASSPATH`. We cannot remove `SPARK_JARS_DIR` because Spark unit tests are actually using it. Author: Yin Huai <yhuai@databricks.com> Closes apache#50 from yhuai/Add-SPARK_DIST_CLASSPATH-toLAUNCH_CLASSPATH.
* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes
* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes
* Create README to better describe project purpose * Add links to usage guide and dev docs * Minor changes
* Refactor and Test of ConfigSecurity * [SPK-64] removed ssl tricks on spark-env (apache#50)
* removed ssl tricks on spark-env * test phase activated * added changes requested from jlopez-malla * changed properties and fixed typos * changed signature for methods
Enable Octavia in LBaaS test of terraform-openstack-provider
…-spark:bump_lineage_logging_211 to netflix/2.1.1-unstable Squashed commit of the following: commit 347c0be48e6613b07d67b6efa9247e116b3a99b2 Author: Daniel Watson <dwatson@netflix.com> Date: Tue Oct 8 09:55:43 2019 -0700 NETFLIX-BUILD: Bump lineage-logging to 0.1.20
This is a port of a pull request original targeted at incubator-spark: https://github.com/apache/incubator-spark/pull/180
Essentially if a user returns a generative iterator (from a flatMap operation), when trying to persist the data, Spark would first unroll the iterator into an ArrayBuffer, and then try to figure out if it could store the data. In cases where the user provided an iterator that generated more data then available memory, this would case a crash. With this patch, if the user requests a persist with a 'StorageLevel.DISK_ONLY', the iterator will be unrolled as it is inputed into the serializer.
To do this, two changes where made: