-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert percentages stored as strings to numerics in formula calculations #3156
Merged
Merged
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
1e3f579
Convert percentages stored as a string to numeric
fdjohnston 2177247
Adding tests for conversion of string percentage to numeric
fdjohnston 6439be5
Correcting PHPStan type warning
fdjohnston dc5c6b9
Tightening up percentage check
fdjohnston ab7b3b1
Using Named Capture groups to handle edge cases
fdjohnston 0b350b1
Slight optimization: Only use pow if exponent is greater than zero
fdjohnston bb845fa
Further Simplification
fdjohnston 64a197e
Styling fixes
MarkBaker 6f3f450
Improve readability
MarkBaker 04fb3bb
Merge branch 'master' into master
MarkBaker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple of little tricks that we can do with the regexp to extract the numeric value directly, rather than doing the
str_replace()
call.To avoid capture of the complete string, I've set that as a non-capture group, because we're only interested in whether the
%
character appears, we don't need to actually capture it or any padding around it; and then I've set capture groups for the actual numeric part of the string. I've also named these groups (PrefixedValue
andPostfixedValue
) so that they can be referenced by name in the$matches
array.Then, if a match is identified, we can extract the numeric part directly by name, whether it's prefixed by the
%
or postfixed.The null coalescence identifies whether the number is prefixed or postfixed (using the named groups from
$matches
); and then does the division by 100 and casts the result to a float. There's no function call overhead; just PHP operations, so it should be more efficient if it's being called in a loop thatr checks a lot of cell values.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great trick with the regexp. I agree that eliminating the
str_replace
would be a big improvement.Unfortunately the changes to the regexp break the negative percentage cases. I'll see if I can run with your idea and get it working for negative values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes! I was being a bit to quick and casual with the regexp changes. If the capture group is expanded to encompass the sign as well, then we're potentially also capturing whitespace too, which would break other things.
One option would be to have additional pre/postfix capture groups for the sign, and a match there would trigger a
* -1
operationand then
So we're still using only operations, and no function calls.
Just to be bloody minded and awkward, MS Excel recognises a
%
prefix with the sign before or after the%
, so%-2.5
and-%2.5
are both recognised as numeric values. Fortunately with a postfix%
, the sign must be before the numeric part.And, of course, the sign could be a
+
as well as a-
; and the numeric part could be in scientific format,-2.5E-4%
.But if you get the basic negative working, I'll merge and then do some more tweaking myself to handle a few of those edge cases; and refactor it all into a separate helper.
I did say right from the start that this was a lot more complicated once you start to look at it in detail. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did warn me! I can't imagine what the code for Excel's formula parser must look like.
I'll take a crack at improving the regexp like you suggest.
Should we be concerned about any of the other spreadsheet applications? I'm focused entirely on Excel because it was the root of my issue, but I suppose it's worth considering how Open Office or Libre Office works? Won't that be a nightmare if they handle these various combinations of
%
,+
, and-
differently.... I don't have either installed at the moment but I'll download a copy tonight and see how they behave.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about other spreadsheet applications: once it's merged, I'll take your code and do some reworking for the additional Excel cases that I know about while I'm refactoring it all into a helper class, and I'll do a systematic check using Open/Libre Office, Gnumeric and even WPS while I'm doing that.
There are sometimes discrepancies between the different Office Suite spreadsheets, which we handle using the compatibility mode - mostly in the implementation of functions - but we treat Excel itself as the default. The worst is that they aren't consistent between versions, even MS Excel has made changes between versions. We don't cater for that; but we always strive for the latest version... and that means retesting everything whenever there's a new major release of Excel/LibreOffice/Gnumeric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay sounds good. I was able to get it working for combinations of
+
,-
, and scientific notation by expanding on the approach you suggested.I broke out all the different pieces of the final calculation because it was getting a bit difficult to follow as single line, owing to all the null coalescing checks. We could simplify it by moving all those checks inline.
Note that, as I mention in the commit, I did have to add the
PREG_UNMATCHED_AS_NULL
flag to thepreg_match
, otherwise I was getting '' back when a capture group wasn't found.Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated my Excel sheet with various permutations.
Excel Test Cases.xlsx