-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39881][PYTHON] Fix erroneous check for black and reenable black validation. #37305
Conversation
@@ -968,7 +968,7 @@ class _CountVectorizerParams(JavaParams, HasInputCol, HasOutputCol): | |||
|
|||
def __init__(self, *args: Any): | |||
super(_CountVectorizerParams, self).__init__(*args) | |||
self._setDefault(minTF=1.0, minDF=1.0, maxDF=2 ** 63 - 1, vocabSize=1 << 18, binary=False) | |||
self._setDefault(minTF=1.0, minDF=1.0, maxDF=2**63 - 1, vocabSize=1 << 18, binary=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think this is false positive from Black presumably (per https://peps.python.org/pep-0008/#other-recommendations). I believe we use one space around these operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this fall into this rule?
If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:
2**63 - 1
The -
has a lower priority compared to the power operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this style change because we are upgrading black
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, yes. I just tested. In the old version it would not touch this change, the new version does. Playing with it actually shows that the behaviour is specifc to the power operator. I'll check if I find some more details in the black releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the doc that mentions about this:
Almost all operators will be surrounded by single spaces, the only exceptions are unary
operators (+
,-
, and~
), and power operators when both operands are simple. For
powers, an operand is considered simple if it's only a NAME, numeric CONSTANT, or
attribute access (chained attribute access is allowed), with or without a preceding
unary operator.
And the commit.
The style change is from 22.1.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Please see - psf/black#538
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like **
is not covered by PEP8, and the main reason of balck change is consider about readable, so I personaly think black choice is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks fine to me.
dev/lint-python
Outdated
$BLACK_BUILD 2> /dev/null | ||
if [ $? -ne 0 ]; then | ||
$PYTHON_EXECUTABLE -c 'import black' &> /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this should be changed ??
The existing code "$PYTHON_EXECUTABLE -m black"
isn't enough to check for Black installation ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is very simple:
> python -m black
Usage: python -m black [OPTIONS] SRC ...
One of 'SRC' or 'code' is required.
> echo $?
1
> python -m black
/System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: No module named black
> echo $?
1
> python -c 'import black'
> echo $?
0
> python -c 'import this_does_not_exist
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'this_does_not_exist'
> echo $?
1
self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser) | ||
self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why Black doesn't reformat this, too...
Shouldn't it be reformatted like:
- self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
- self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser)
+ self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1)**b_psser)
+ self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1)**b_psser)
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no black expert, but from the discussion I had with @HyukjinKwon above, it looks like it's a operator precedence behavior thing.
If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:
in
self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
The binary operator **
is not paired with a lower priority other operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand about the priority but just wonder why Black did:
- self.assertRaises(TypeError, lambda: True ** b_psser)
+ self.assertRaises(TypeError, lambda: True**b_psser)
but didn't
- self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1) ** b_psser)
- self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1) ** b_psser)
+ self.assertRaises(TypeError, lambda: datetime.date(1994, 1, 1)**b_psser)
+ self.assertRaises(TypeError, lambda: datetime.datetime(1994, 1, 1)**b_psser)
It seems like just inconsistence behavior from Black, so maybe we can just keep is as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is explained here #37305 (comment) - in short the operators are not simple ( True
vs datetime.datetime(1994, 1, 1)
)
Anyway, +1 for upgrading the version to the latest. |
Can we also add |
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add
[PS]
to the PR title ?
This should not be categorized as [PS]
but PySpark in general.
@grundprinzip Could you use [PYTHON]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting unnecessary change.
Updated the PR description to indicate why the black version was updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some nits but it's also OK for me, so just go ahead if you think it's ok!
and CI failed due to Run / Scala 2.13 build with SBT git clone networking issue, I think we can pass it by re-triggering. |
I updated the remaining comments. Can we merge the PR? |
Merged to master. |
What changes were proposed in this pull request?
The previously committed check for running black did not actually work and caused code to be committed that does not follow the linter rules. This patch fixes the way we check if black is locally installed and update the
dev/reformat-python
script. In addition, we run the script to fix existing style issues. Similar to the original PR #32779 this patch only applies the black checks on the pandas code.The black version is updated in this PR because on an empty virtualenv the selected version of click ends up in a conflict due to a underspecified version of click. See psf/black#2964.
Why are the changes needed?
We have linter rules, we should actually address them.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual testing.