Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4467] fix elements read count for ExtrenalSorter #3302

Closed
wants to merge 4 commits into from

Conversation

tsdeng
Copy link
Contributor

@tsdeng tsdeng commented Nov 17, 2014

the elementsRead variable should be reset to 0 after each spilling

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@sryza
Copy link
Contributor

sryza commented Nov 17, 2014

As far as I can tell, elementsRead isn't used for anything. Would we be able to just remove it entirely?

@sryza
Copy link
Contributor

sryza commented Nov 17, 2014

Also, mind filing a JIRA for this, or, if one already exists, including the name in the title here?

@tsdeng tsdeng changed the title fix elements read count for ExtrenalSorter [SPARK-4452] fix elements read count for ExtrenalSorter Nov 17, 2014
@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 17, 2014

I think it's used for controlling the frequency of checking to spilling. It's used in Spillable.scala.
Without this fix, it may spill very small files.

@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 17, 2014

I guess, the original intention of this variable is to make sure there is at least 1K records before each spilling. We saw a too many files open exception due to this variable is not being updated correctly. Of course , this is not the root cause of the issue, I currently have another working branch trying to tackle the deeper cause of this, as mentioned in https://issues.apache.org/jira/browse/SPARK-4452. But at the same time, I'm sending this PR to fix the updating the elementsRead to alleviate the problem

@sryza
Copy link
Contributor

sryza commented Nov 17, 2014

That makes sense, my IDE for some reason didn't show me the usage in Spillable.scala. In that case, this change makes sense.

Spilling is also based on the amount of memory taken up. Do you know what the thinking is for basing it on the number of records as well?

@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 17, 2014

From my understanding, this variable is used as a lower bound of the number of records when spilling. It's useful when the memory is really low.

@sryza
Copy link
Contributor

sryza commented Nov 17, 2014

I don't entirely understand that line of argument. Why would we want to place a lower bound if the data structure is pushing the memory threshold? I filed https://issues.apache.org/jira/browse/SPARK-4456 to figure this out and document it better.

This patch LGTM though.

@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 17, 2014

Yeah, I guess we need that lower bound when memory threshold is not "pushable", meaning when memory is too small and you can not acquire memory... I agree this behavior is kinda weird. Maybe it's used as "the last defense"? Please refer to https://issues.apache.org/jira/browse/SPARK-4452 why this last defense saved my svd job. I'm also working on another branch documented on SPARK-4452 to make a deeper fix for memory allocation for spillable objects

@andrewor14
Copy link
Contributor

add to whitelist

@andrewor14
Copy link
Contributor

I see, after a while we unconditionally try to spill every 32 elements regardless of whether the in-memory buffer has exceeded the spill threshold. This is a serious problem and it seems that this is just an omission in the original code since we don't ever update elementsRead ever in this code path. Changes here LGTM.

I think this is the first step towards fixing the too many files open issue that many are seeing. We still need to hunt down the root cause for why the lower bound for how much memory a data structure can have is not being accounted for properly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23515 has started for PR 3302 at commit 74ca246.

  • This patch merges cleanly.

@mateiz
Copy link
Contributor

mateiz commented Nov 17, 2014

@tsdeng this patch looks good but wouldn't it be better to set it to 0 in Spillable, after it calls spill()? Then it will fix this problem in all subclasses, and you can remove the code that sets it to 0 in ExternalAppendOnlyMap.

@sryza
Copy link
Contributor

sryza commented Nov 17, 2014

after a while we unconditionally try to spill every 32 elements regardless of whether the in-memory buffer has exceeded the spill threshold.

The code still verifies currentMemory >= myMemoryThreshold as well, right?

@mateiz
Copy link
Contributor

mateiz commented Nov 18, 2014

@sryza yeah it does, the problem is just that myMemoryThreshold turns to 0 after you spill. The idea was to wait for at least 1000 more elements before requesting memory, but it currently doesn't, and it gets a 0 returned.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23515 has finished for PR 3302 at commit 74ca246.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23515/
Test PASSed.

@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 18, 2014

@mateiz
agree, but elementsRead is a def in Spillable, does adding a resetElementsRead sounds good to you?

@mateiz
Copy link
Contributor

mateiz commented Nov 18, 2014

Oh, weird, it is. What about changing it to a var, any problems with that?

@mateiz
Copy link
Contributor

mateiz commented Nov 18, 2014

Basically it is weird to have a var shared with subclasses, but I think this will be more obvious. Probably the very best way is to make elementsRead be modified only in Spillable, and add a method called addElementRead or something like that subclasses can call to tell it to increment the count.

@tsdeng
Copy link
Contributor Author

tsdeng commented Nov 18, 2014

Ha, my bad, elementsRead is a var, but also like to keep it private or protected and make addElementRead a method to manipulate it

@mateiz
Copy link
Contributor

mateiz commented Nov 18, 2014

Sounds good.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23533 has started for PR 3302 at commit bb7ff28.

  • This patch merges cleanly.

asfgit pushed a commit that referenced this pull request Nov 18, 2014
This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR.

Author: Andrew Or <andrew@databricks.com>

Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits:

486fc49 [Andrew Or] Reset `elementsRead`
@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23533 has finished for PR 3302 at commit bb7ff28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23533/
Test PASSed.


// subclass calls this method to notify reading an element
// it's used to check spilling frequency
protected def addElementsRead = elementsRead += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add parentheses to this and call it with addElementsRead() since it has side effects.

@tsdeng tsdeng changed the title [SPARK-4452] fix elements read count for ExtrenalSorter [SPARK-4467] fix elements read count for ExtrenalSorter Nov 18, 2014
@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23553 has started for PR 3302 at commit 782c7de.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23553 has finished for PR 3302 at commit 782c7de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23553/
Test PASSed.

@@ -132,7 +130,7 @@ class ExternalAppendOnlyMap[K, V, C](
currentMap = new SizeTrackingAppendOnlyMap[K, C]
}
currentMap.changeValue(curEntry._1, update)
elementsRead += 1
addElementsRead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you still need the () here and everywhere this is used

@andrewor14
Copy link
Contributor

A few more minor comments. This LGTM otherwise

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23564 has started for PR 3302 at commit 7b56ca0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23564 has finished for PR 3302 at commit 7b56ca0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23564/
Test PASSed.

@andrewor14
Copy link
Contributor

@mateiz any other comments?

@mateiz
Copy link
Contributor

mateiz commented Nov 19, 2014

@andrewor14 nope, it looks good.

@andrewor14
Copy link
Contributor

Ok, merging into master and 1.2. Thanks @tsdeng.
The 1.1 version was previously merged at #3330.

@asfgit asfgit closed this in d75579d Nov 19, 2014
asfgit pushed a commit that referenced this pull request Nov 19, 2014
the elementsRead variable should be reset to 0 after each spilling

Author: Tianshuo Deng <tdeng@twitter.com>

Closes #3302 from tsdeng/fix_external_sorter_record_count and squashes the following commits:

7b56ca0 [Tianshuo Deng] fix method signature
782c7de [Tianshuo Deng] make elementsRead private, fix comment
bb7ff28 [Tianshuo Deng] update elemetsRead through addElementsRead method
74ca246 [Tianshuo Deng] fix elements read count

(cherry picked from commit d75579d)
Signed-off-by: Andrew Or <andrew@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants