Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert percentages stored as strings to numerics in formula calculations #3156

Merged
merged 10 commits into from
Nov 10, 2022
4 changes: 2 additions & 2 deletions src/PhpSpreadsheet/Calculation/Calculation.php
Original file line number Diff line number Diff line change
Expand Up @@ -5110,8 +5110,8 @@ private function validateBinaryOperand(&$operand, &$stack)
$this->debugLog->writeDebugLog('Evaluation Result is %s', $this->showTypeDetails($operand));

return false;
} elseif (!Shared\StringHelper::convertToNumberIfFraction($operand)) {
// If not a numeric or a fraction, then it's a text string, and so can't be used in mathematical binary operations
} elseif (!Shared\StringHelper::convertToNumberIfFraction($operand) && !Shared\StringHelper::convertToNumberIfPercent($operand)) {
// If not a numeric, a fraction or a percentage, then it's a text string, and so can't be used in mathematical binary operations
$stack->push('Error', '#VALUE!');
$this->debugLog->writeDebugLog('Evaluation Result is a %s', $this->showTypeDetails('#VALUE!'));

Expand Down
18 changes: 18 additions & 0 deletions src/PhpSpreadsheet/Shared/StringHelper.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ class StringHelper
// Fraction
const STRING_REGEXP_FRACTION = '~^\s*(-?)((\d*)\s+)?(\d+\/\d+)\s*$~';

const STRING_REGEXP_PERCENT = '~^( *-? *\% *[0-9]+\.?[0-9*]* *| *\-? *[0-9]+\.?[0-9]* *\% *)$~';

/**
* Control characters array.
*
Expand Down Expand Up @@ -560,6 +562,22 @@ public static function convertToNumberIfFraction(string &$operand): bool

// function convertToNumberIfFraction()

/**
* Identify whether a string contains a percentage, and if so,
* convert it to a numeric.
*
* @param string $operand string value to test
*/
public static function convertToNumberIfPercent(string &$operand): bool
{
if (preg_match(self::STRING_REGEXP_PERCENT, $operand, $match)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of little tricks that we can do with the regexp to extract the numeric value directly, rather than doing the str_replace() call.

~^(?: *-? *\% *(?<PrefixedValue>[0-9]+\.?[0-9*]*) *| *\-? *(?<PostfixedValue>[0-9]+\.?[0-9]*) *\% *)$~

To avoid capture of the complete string, I've set that as a non-capture group, because we're only interested in whether the % character appears, we don't need to actually capture it or any padding around it; and then I've set capture groups for the actual numeric part of the string. I've also named these groups (PrefixedValue and PostfixedValue) so that they can be referenced by name in the $matches array.

Then, if a match is identified, we can extract the numeric part directly by name, whether it's prefixed by the % or postfixed.

$poperand = (float) ($match['PostfixedValue'] ?? $match['PrefixedValue']) / 100.0;

The null coalescence identifies whether the number is prefixed or postfixed (using the named groups from $matches); and then does the division by 100 and casts the result to a float. There's no function call overhead; just PHP operations, so it should be more efficient if it's being called in a loop thatr checks a lot of cell values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great trick with the regexp. I agree that eliminating the str_replace would be a big improvement.
Unfortunately the changes to the regexp break the negative percentage cases. I'll see if I can run with your idea and get it working for negative values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes! I was being a bit to quick and casual with the regexp changes. If the capture group is expanded to encompass the sign as well, then we're potentially also capturing whitespace too, which would break other things.

One option would be to have additional pre/postfix capture groups for the sign, and a match there would trigger a * -1 operation

~^(?: *(?<PrefixedNegative>-?) *\% *(?<PrefixedValue>[0-9]+\.?[0-9*]*) *| *(?<PrefixedNegative>\-?) *(?<PostfixedValue>[0-9]+\.?[0-9]*) *\% *)$~

and then

$sign = $match['PostfixedNegative'] ?? $match['PrefixedNegative'];
$poperand *= ($sign === '-') ? -1.0 : 1.0;

So we're still using only operations, and no function calls.

Just to be bloody minded and awkward, MS Excel recognises a % prefix with the sign before or after the %, so %-2.5 and -%2.5 are both recognised as numeric values. Fortunately with a postfix %, the sign must be before the numeric part.
And, of course, the sign could be a + as well as a -; and the numeric part could be in scientific format, -2.5E-4%.

But if you get the basic negative working, I'll merge and then do some more tweaking myself to handle a few of those edge cases; and refactor it all into a separate helper.

I did say right from the start that this was a lot more complicated once you start to look at it in detail. ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did warn me! I can't imagine what the code for Excel's formula parser must look like.
I'll take a crack at improving the regexp like you suggest.

Should we be concerned about any of the other spreadsheet applications? I'm focused entirely on Excel because it was the root of my issue, but I suppose it's worth considering how Open Office or Libre Office works? Won't that be a nightmare if they handle these various combinations of %, +, and - differently.... I don't have either installed at the moment but I'll download a copy tonight and see how they behave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about other spreadsheet applications: once it's merged, I'll take your code and do some reworking for the additional Excel cases that I know about while I'm refactoring it all into a helper class, and I'll do a systematic check using Open/Libre Office, Gnumeric and even WPS while I'm doing that.

There are sometimes discrepancies between the different Office Suite spreadsheets, which we handle using the compatibility mode - mostly in the implementation of functions - but we treat Excel itself as the default. The worst is that they aren't consistent between versions, even MS Excel has made changes between versions. We don't cater for that; but we always strive for the latest version... and that means retesting everything whenever there's a new major release of Excel/LibreOffice/Gnumeric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good. I was able to get it working for combinations of +, -, and scientific notation by expanding on the approach you suggested.
I broke out all the different pieces of the final calculation because it was getting a bit difficult to follow as single line, owing to all the null coalescing checks. We could simplify it by moving all those checks inline.
Note that, as I mention in the commit, I did have to add the PREG_UNMATCHED_AS_NULL flag to the preg_match, otherwise I was getting '' back when a capture group wasn't found.
Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated my Excel sheet with various permutations.
Excel Test Cases.xlsx

$operand = ((float)(str_replace(['%', ' '], '', $match[0])) / 100.00);
return true;
}

return false;
}

/**
* Get the decimal separator. If it has not yet been set explicitly, try to obtain number
* formatting information from locale.
Expand Down
12 changes: 12 additions & 0 deletions tests/PhpSpreadsheetTests/Calculation/CalculationTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,18 @@ public function testCellWithFormulaTwoIndirect(): void
self::assertEquals('9', $cell3->getCalculatedValue());
}

public function testCellWithStringPercentage(): void
{
$spreadsheet = new Spreadsheet();
$workSheet = $spreadsheet->getActiveSheet();
$cell1 = $workSheet->getCell('A1');
$cell1->setValue('2%');
$cell2 = $workSheet->getCell('B1');
$cell2->setValue('=100*A1');

self::assertEquals('2', $cell2->getCalculatedValue());
}

public function testBranchPruningFormulaParsingSimpleCase(): void
{
$calculation = Calculation::getInstance();
Expand Down
97 changes: 97 additions & 0 deletions tests/PhpSpreadsheetTests/Shared/StringHelperTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -148,4 +148,101 @@ public function providerFractions(): array
'improper fraction' => ['1.75', '7/4'],
];
}

/**
* @dataProvider providerPercentages
*/
public function testPercentage(string $expected, string $value): void
{
$originalValue = $value;
$result = StringHelper::convertToNumberIfPercent($value);
if ($result === false) {
self::assertSame($expected, $originalValue);
self::assertSame($expected, $value);
} else {
self::assertSame($expected, (string) $value);
self::assertNotEquals($value, $originalValue);
}
}

public function providerPercentages(): array
{
return [
'non-percentage' => ['10', '10'],
'single digit percentage' => ['0.02', '2%'],
'two digit percentage' => ['0.13', '13%'],
'negative single digit percentage' => ['-0.07', '-7%'],
'negative two digit percentage' => ['-0.75', '-75%'],
'large percentage' => ['98.45', '9845%'],
'small percentage' => ['0.0005', '0.05%'],
'percentage with decimals' => ['0.025', '2.5%'],
'trailing percent with space' => ['0.02', '2 %'],
'trailing percent with leading and trailing space' => ['0.02', ' 2 % '],
'leading percent with decimals' => ['0.025', ' % 2.5'],

//These should all fail
'percent only' => ['%', '%'],
'nonsense percent' => ['2%2', '2%2'],
'negative leading percent' => ['-0.02', '-%2'],

//Percent position permutations
'permutation_1' => ['0.02', '2%'],
'permutation_2' => ['0.02', ' 2%'],
'permutation_3' => ['0.02', '2% '],
'permutation_4' => ['0.02', ' 2 % '],
'permutation_5' => ['0.0275', '2.75% '],
'permutation_6' => ['0.0275', ' 2.75% '],
'permutation_7' => ['0.0275', ' 2.75 % '],
'permutation_8' => [' 2 . 75 %', ' 2 . 75 %'],
'permutation_9' => [' 2.7 5 % ', ' 2.7 5 % '],
'permutation_10' => ['-0.02', '-2%'],
'permutation_11' => ['-0.02', ' -2% '],
'permutation_12' => ['-0.02', '- 2% '],
'permutation_13' => ['-0.02', '-2 % '],
'permutation_14' => ['-0.0275', '-2.75% '],
'permutation_15' => ['-0.0275', ' -2.75% '],
'permutation_16' => ['-0.0275', '-2.75 % '],
'permutation_17' => ['-0.0275', ' - 2.75 % '],
'permutation_18' => ['0.02', '2%'],
'permutation_19' => ['0.02', '% 2 '],
'permutation_20' => ['0.02', ' %2 '],
'permutation_21' => ['0.02', ' % 2 '],
'permutation_22' => ['0.0275', '%2.75 '],
'permutation_23' => ['0.0275', ' %2.75 '],
'permutation_24' => ['0.0275', ' % 2.75 '],
'permutation_25' => [' %2 . 75 ', ' %2 . 75 '],
'permutation_26' => [' %2.7 5 ', ' %2.7 5 '],
'permutation_27' => [' % 2 . 75 ', ' % 2 . 75 '],
'permutation_28' => [' % 2.7 5 ', ' % 2.7 5 '],
'permutation_29' => ['-0.0275', '-%2.75 '],
'permutation_30' => ['-0.0275', ' - %2.75 '],
'permutation_31' => ['-0.0275', '- % 2.75 '],
'permutation_32' => ['-0.0275', ' - % 2.75 '],
'permutation_33' => ['0.02', '2%'],
'permutation_34' => ['0.02', '2 %'],
'permutation_35' => ['0.02', ' 2%'],
'permutation_36' => ['0.02', ' 2 % '],
'permutation_37' => ['0.0275', '2.75%'],
'permutation_38' => ['0.0275', ' 2.75 % '],
'permutation_39' => ['2 . 75 % ', '2 . 75 % '],
'permutation_40' => ['-0.0275', '-2.75% '],
'permutation_41' => ['-0.0275', '- 2.75% '],
'permutation_42' => ['-0.0275', ' - 2.75% '],
'permutation_43' => ['-0.0275', ' -2.75 % '],
'permutation_44' => ['-2. 75 % ', '-2. 75 % '],
'permutation_45' => ['%', '%'],
'permutation_46' => ['0.02', '%2 '],
'permutation_47' => ['0.02', '% 2 '],
'permutation_48' => ['0.02', ' %2 '],
'permutation_49' => ['0.02', '% 2 '],
'permutation_50' => ['0.02', ' % 2 '],
'permutation_51' => ['0.02', ' 2 % '],
'permutation_52' => ['-0.02', '-2%'],
'permutation_53' => ['-0.02', '- %2'],
'permutation_54' => ['-0.02', ' -%2 '],
'permutation_55' => ['2%2', '2%2'],
'permutation_56' => [' 2% %', ' 2% %'],
'permutation_57' => [' % 2 -', ' % 2 -'],
];
}
}