-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46687][PYTHON][CONNECT] Basic support of SparkSession-based memory profiler #44775
Conversation
1c45a3e
to
38c3bd9
Compare
d8ee61e
to
4a1be51
Compare
f3e2e99
to
d674f45
Compare
https://github.com/xinrong-meng/spark/actions/runs/7648782322/job/20842144027 failure is irrelevant to the PR changes. I will rebase master. |
f92b684
to
af09746
Compare
python/pyspark/profiler.py
Outdated
measures = self[code] | ||
if not measures: | ||
continue # skip if no measurement | ||
linenos = range(min(measures), max(measures) + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to delay to generate the full linenos
until showing the results to reduce the intermediate data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
============================================================
Profile of UDF<id=2>
============================================================
Filename: /var/folders/h_/60n1p_5s7751jx1st4_sk0780000gp/T/ipykernel_69451/109011680.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
8 147.7 MiB 147.7 MiB 20 @udf("string")
9 def a(x):
10 149.6 MiB 1.8 MiB 20 if TaskContext.get().partitionId() % 2 == 0:
11 59.9 MiB 0.1 MiB 8 return str(x)
12 else:
13 89.9 MiB 0.1 MiB 12 return None
tested on Jupyter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
Thanks! merging to master. |
Thank you @ueshin ! |
…s when codecov enabled ### What changes were proposed in this pull request? This is a followup of #44775 that skips the tests with codecov on. It fails now (https://github.com/apache/spark/actions/runs/7709423681/job/21010676103) and the coverage report is broken. ### Why are the changes needed? To recover the test coverage report. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45112 from HyukjinKwon/SPARK-46687-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Basic support of SparkSession-based memory profiler in both Spark Connect and non-Spark-Connect.
Why are the changes needed?
We need to make the memory profiler SparkSession-based to support memory profiling in Spark Connect.
Does this PR introduce any user-facing change?
Yes, the SparkSession-based memory profiler is available.
An example is as shown below
shows profile result:
How was this patch tested?
New and existing unit tests:
And manual tests on Jupyter notebook.
Was this patch authored or co-authored using generative AI tooling?
No.