-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MATCH Problems with Int/Float Compare and Wildcards #3142
Conversation
Fix PHPOffice#3141. Function matchSmallestValue did not recognize that an integer could match a float. Adding test cases, it seems that matchFirstValue had the same problem. However, matchLargestValue seemed to handle things correctly - but see below. In addition, the wildcard logic in matchFirstValue is faulty. It ignored tilde as a wildcard character. Although it would have been easy to just add that, I think it was wrong to determine on its own if a wildcard was in use. Just using the already available wildcard functions whenever comparing two strings is sufficient. I note that Excel doesn't seem to follow its own rules for MATCH (https://support.microsoft.com/en-us/office/match-function-e8dffd45-c762-47d6-bf89-533f4a37673a?ns=excel&version=90&syslcid=1033&uilcid=1033&appver=zxl900&helpid=xlmain11.chm60112&ui=en-us&rs=en-us&ad=us). PhpSpreadsheet's results match Excel's, so no problem. However, when match_type is not zero, the match array is supposed to be sorted, so I would expect `#N/A` when it isn't; but that's not how Excel operates. I have no idea what Excel is doing. If `MATCH(2,{2,0,4,3},1)` isn't `#N/A` because of the unsorted array, then surely it should be `1` (item 1 of the array is the largest number less than or equal to the lookup value); but Excel and PhpSpreadsheet (before and after changes) return `2`. I have moved this example to be the first of the test cases. One would think strings would behave similarly. But, no - see the second test case. This time Excel does look for an exact match. But the existing logic doesn't get the matching result in PhpSpreadsheet. It requires a whole new block of code, one which doesn't work correctly for numeric lookup value. Ugh. LibreOffice doesn't always agree with Excel. It seems that it will use wildcard matching even when the match type is not zero (Excel documentation says wildcards are only for type zero, which is just as well because I don't really know what greater/less mean when wildcards are involved). I have not attempted to duplicate this behavior. For the record, Gnumeric agrees with Excel here.
No intention to act on Scrutinizer "complexity" warnings. |
@MarkBaker I seem to have opened up a can of worms here (I should have paid more attention to your "not straightforward" comment). I think more test cases are probably needed. In your comments on the original issue, you said "we can't then use the same comparison for numerics and non-numeric strings" - I think my code, but not necessarily the tests, are taking care of that. Do you have additional test cases to suggest? |
The most obvious additional tests would be using numeric strings (containing both |
{ | ||
$wildcardLookup = ((bool) preg_match('/([\?\*])/', $lookupValue)); | ||
$wildcard = WildcardMatch::wildcard($lookupValue); | ||
if (is_string($lookupValue)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an is not numeric check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, although this is another confusing aspect of MATCH. If you look at the test to which I added the label 'string compared to number type -1', numeric 6 does not seem to match string '6', and that test returned #N/A
both before and after this change, as does Excel. I will add more tests.
Another surprise (which will require a code change) - Excel accepts anything numeric for match type, paying attention only to the sign - floats, numeric strings, values other than 0/1/-1. Also, non-numeric strings for match type will generate VALUE error, not NA. |
Add support for LibreOffice matching wildcard strings when type is not zero. Add support for type to be specified as integer other than 0/1/-1, or as float, or as numeric string; non-numeric string should case `#VALUE!` error. I have found an example of undefined behavior (unsorted array where type is non-zero) where PhpSpreadsheet does not produce the same result as Excel. It is present as a new `incomplete` test case. I can fix it, but not without breaking other tests where the proper behavior is undefined. IMO, this is not a problem we should be concerned about. Many test cases are added. Chances are I will add some more before merging this change.
### Added - Extended flag options for the Reader `load()` and Writer `save()` methods - Apply Row/Column limits (1048576 and XFD) in ReferenceHelper [PR #3213](#3213) - Allow the creation of In-Memory Drawings from a string of binary image data, or from a stream. [PR #3157](#3157) - Xlsx Reader support for Pivot Tables [PR #2829](#2829) - Permit Date/Time Entered on Spreadsheet to be calculated as Float [Issue #1416](#1416) [PR #3121](#3121) ### Changed - Nothing ### Deprecated - Direct update of Calculation::suppressFormulaErrors is replaced with setter. - Font public static variable defaultColumnWidths replaced with constant DEFAULT_COLUMN_WIDTHS. - ExcelError public static variable errorCodes replaced with constant ERROR_CODES. - NumberFormat constant FORMAT_DATE_YYYYMMDD2 replaced with existing identical FORMAT_DATE_YYYYMMDD. ### Removed - Nothing ### Fixed - Fixed handling for `_xlws` prefixed functions from Office365 [Issue #3245](#3245) [PR #3247](#3247) - Conditionals formatting rules applied to rows/columns are removed [Issue #3184](#3184) [PR #3213](#3213) - Treat strings containing currency or accounting values as floats in Calculation Engine operations [Issue #3165](#3165) [PR #3189](#3189) - Treat strings containing percentage values as floats in Calculation Engine operations [Issue #3155](#3155) [PR #3156](#3156) and [PR #3164](#3164) - Xlsx Reader Accept Palette of Fewer than 64 Colors [Issue #3093](#3093) [PR #3096](#3096) - Use Locale-Independent Float Conversion for Xlsx Writer Custom Property [Issue #3095](#3095) [PR #3099](#3099) - Allow setting AutoFilter range on a single cell or row [Issue #3102](#3102) [PR #3111](#3111) - Xlsx Reader External Data Validations Flag Missing [Issue #2677](#2677) [PR #3078](#3078) - Reduces extra memory usage on `__destruct()` calls [PR #3092](#3092) - Additional properties for Trendlines [Issue #3011](#3011) [PR #3028](#3028) - Calculation suppressFormulaErrors fix [Issue #1531](#1531) [PR #3092](#3092) - Permit Date/Time Entered on Spreadsheet to be Calculated as Float [Issue #1416](#1416) [PR #3121](#3121) - Incorrect Handling of Data Validation Formula Containing Ampersand [Issue #3145](#3145) [PR #3146](#3146) - Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3136](#3137) - Generation3 Copy With Image in Footer [Issue #3126](#3126) [PR #3140](#3140) - MATCH Function Problems with Int/Float Compare and Wildcards [Issue #3141](#3141) [PR #3142](#3142) - Fix ODS Read Filter on number-columns-repeated cell [Issue #3148](#3148) [PR #3149](#3149) - Problems Formatting Very Small and Very Large Numbers [Issue #3128](#3128) [PR #3152](#3152) - XlsxWrite preserve line styles for y-axis, not just x-axis [PR #3163](#3163) - Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3137](#3137) - More Detail for Cyclic Error Messages [Issue #3169](#3169) [PR #3170](#3170) - Improved Documentation for Deprecations - many PRs [Issue #3162](#3162)
Fix #3141. Function matchSmallestValue did not recognize that an integer could match a float. Adding test cases, it seems that matchFirstValue had the same problem. However, matchLargestValue seemed to handle things correctly - but see below.
In addition, the wildcard logic in matchFirstValue is faulty. It ignored tilde as a wildcard character. Although it would have been easy to just add that, I think it was wrong to determine on its own if a wildcard was in use. Just using the already available wildcard functions whenever comparing two strings is sufficient.
I note that Excel doesn't seem to follow its own rules for MATCH (https://support.microsoft.com/en-us/office/match-function-e8dffd45-c762-47d6-bf89-533f4a37673a?ns=excel&version=90&syslcid=1033&uilcid=1033&appver=zxl900&helpid=xlmain11.chm60112&ui=en-us&rs=en-us&ad=us). PhpSpreadsheet's results match Excel's, so no problem. However, when match_type is not zero, the match array is supposed to be sorted, so I would expect
#N/A
when it isn't; but that's not how Excel operates. I have no idea what Excel is doing. IfMATCH(2,{2,0,4,3},1)
isn't#N/A
because of the unsorted array, then surely it should be1
(item 1 of the array is the largest number less than or equal to the lookup value); but Excel and PhpSpreadsheet (before and after changes) return2
. I have moved this example to be the first of the test cases. And, if you have a theory for why this works way (after all PhpSpreadsheet can match it), try it withMATCH(5,{8,6,3,1},1)
, and see if your theory works for the Excel result of#N/A
(which PhpSpreadsheet doesn't duplicate).One would think strings would behave similarly. But, no - see the second test case. This time Excel does look for an exact match. But the existing logic doesn't get the matching result in PhpSpreadsheet. It requires a whole new block of code, one which doesn't work correctly for numeric lookup value. Ugh.
LibreOffice doesn't always agree with Excel. It seems that it will use wildcard matching even when the match type is not zero (Excel documentation says wildcards are only for type zero, which is just as well because I don't really know what greater/less mean when wildcards are involved). This behavior is now duplicated in PhpSpreadsheet. For the record, Gnumeric agrees with Excel here.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.