Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4307] Initialize FileDescriptor lazily in FileRegion. #3172

Closed
wants to merge 4 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Nov 9, 2014

Netty's DefaultFileRegion requires a FileDescriptor in its constructor, which means we need to have a opened file handle. In super large workloads, this could lead to too many open files due to the way these file descriptors are cleaned. This pull request creates a new LazyFileRegion that initializes the FileDescriptor when we are sending data for the first time.

@@ -22,6 +22,9 @@ import java.nio.ByteBuffer
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.atomic.AtomicInteger

import org.apache.spark.network.netty.SparkTransportConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import order

@SparkQA
Copy link

SparkQA commented Nov 9, 2014

Test build #23111 has started for PR 3172 at commit 04cddc8.

  • This patch merges cleanly.

this.conf = conf;
}

@VisibleForTesting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need 2 VisibleForTesting constructors, do we? If someone wants to provide an explicit Executor, they can damn well create their own TransportConf. This is only done in a single unit test anyway.

@aarondav
Copy link
Contributor

aarondav commented Nov 9, 2014

LGTM, just a few minor style nits.

@SparkQA
Copy link

SparkQA commented Nov 9, 2014

Test build #23111 has finished for PR 3172 at commit 04cddc8.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class LazyFileRegion extends AbstractReferenceCounted implements FileRegion

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23111/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23143 has started for PR 3172 at commit 6ed369e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23143 has finished for PR 3172 at commit 6ed369e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class LazyFileRegion extends AbstractReferenceCounted implements FileRegion

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23143/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23147 has started for PR 3172 at commit d4564ae.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23147 has finished for PR 3172 at commit d4564ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IndexShuffleBlockManager(conf: SparkConf) extends ShuffleBlockManager
    • public final class LazyFileRegion extends AbstractReferenceCounted implements FileRegion

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23147/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #515 has started for PR 3172 at commit d4564ae.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #515 has finished for PR 3172 at commit d4564ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 10, 2014

@aarondav comments addressed.

@aarondav
Copy link
Contributor

Cool, everything LGTM. Would you mind adding a unit test for LazyFileRegion, though, in case people try to modify it in the future?

@rxin
Copy link
Contributor Author

rxin commented Nov 11, 2014

cc @normanmaurer @trustin

Do you think we can add this directly to Netty? Doing this at Spark level means Epoll won't support this feature.

More specifically, I was referring to LazyFileRegion.java.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23190 has started for PR 3172 at commit 0bdcdc6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23190 has finished for PR 3172 at commit 0bdcdc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IndexShuffleBlockManager(conf: SparkConf) extends ShuffleBlockManager
    • public final class LazyFileRegion extends AbstractReferenceCounted implements FileRegion

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23190/
Test PASSed.

@aarondav
Copy link
Contributor

Merging into master and branch-1.2.0.

@asfgit asfgit closed this in ef29a9a Nov 11, 2014
asfgit pushed a commit that referenced this pull request Nov 11, 2014
Netty's DefaultFileRegion requires a FileDescriptor in its constructor, which means we need to have a opened file handle. In super large workloads, this could lead to too many open files due to the way these file descriptors are cleaned. This pull request creates a new LazyFileRegion that initializes the FileDescriptor when we are sending data for the first time.

Author: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@apache.org>

Closes #3172 from rxin/lazyFD and squashes the following commits:

0bdcdc6 [Reynold Xin] Added reference to Netty's DefaultFileRegion
d4564ae [Reynold Xin] Added SparkConf to the ctor argument of IndexShuffleBlockManager.
6ed369e [Reynold Xin] Code review feedback.
04cddc8 [Reynold Xin] [SPARK-4307] Initialize FileDescriptor lazily in FileRegion.

(cherry picked from commit ef29a9a)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
@trustin
Copy link

trustin commented Nov 11, 2014

@rxin, @normanmaurer told me via IM that he has a patch pending. Could you file an issue and CC him?

@rxin
Copy link
Contributor Author

rxin commented Nov 11, 2014

Done - netty/netty#3129

@rxin rxin deleted the lazyFD branch December 2, 2014 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants