-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete implementation of NSScanner #2582
Conversation
Frameworks/Foundation/NSScanner.mm
Outdated
if (!scannedPeriod) { | ||
scannedPeriod = true; | ||
} else { | ||
goto done_scanning_hex_double; // Stop scanning once a second period has been scanned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since break inside a switch in a for loop breaks the switch and not the outer loop, this unfortunately seemed to be the best out of a group of bad options:
- Add a done bool that is checked on every iteration: this causes ++i to occur once more than necessary unless it's the end of the string, making the set of _scanLocation = 1 at the end more complicated than necessary
- Move the switch to a helper function: would need to manipulate both mantissa and exponent through out-pointers, seems unnecessarily messy
- Add a continue to every 'good' case, then add a break outside the switch that the 'bad' cases can fall too: wwwwaayyy too messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not sure if our coding standards handle gotos, but I think if this is the best option then we should make it follow the macro-style naming convention of ALL_CAPS so it really stands out.
Frameworks/Foundation/NSScanner.mm
Outdated
} else if (!hasSign && unicode == '+') { | ||
sign = 1; | ||
hasSign = TRUE; | ||
hasSign = YES; | ||
} else if (unicode >= '0' && unicode <= '9') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iswdigit
?
Frameworks/Foundation/NSScanner.mm
Outdated
} else if (unicode >= '0' && unicode <= '9') { | ||
if (!hasOverflow) { | ||
int c = unicode - '0'; | ||
|
||
// Inspired by http://www.math.utoledo.edu/~dbastos/overflow.html | ||
if ((long_long_MAX - c) / 10 < value) | ||
hasOverflow = TRUE; | ||
if ((longLongMax - c) / 10 < value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaaa my eyes they are burning
nit: please fix these weirdos 😄
Frameworks/Foundation/NSScanner.mm
Outdated
if ([_skipSet characterIsMember:unicode]) | ||
state = STATE_SPACE; | ||
else if (unicode == '0') { | ||
case STATE_START: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these states be unified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the first two)
Frameworks/Foundation/NSScanner.mm
Outdated
} | ||
|
||
return TRUE; | ||
return YES; | ||
} | ||
|
||
/** | ||
@Status Interoperable | ||
*/ | ||
- (BOOL)scanHexInt:(unsigned*)valuep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we implement scanHexInt in terms of scanHexLongLong? we'd have to accoutn for overflow behaviour, but it would kill half our hex reading code
Frameworks/Foundation/NSScanner.mm
Outdated
value = check; | ||
else { | ||
value = -1; | ||
if (0xF000000000000000 & value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this need ULL at the end? hmm
bool mantissaIsNegative = false; | ||
bool exponentIsNegative = false; | ||
bool scannedPeriod = false; | ||
bool hasValue = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: your new code has hasValue
as a bool
but the old code has it as BOOL
; standardize?
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
done_scanning_hex_double:; // Allows breaking out of the loop from inside the switch | ||
if (hasValue) { | ||
if (result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an overflow case to be had for decimals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean besides the exponent getting too large and causing ldexp to return HUGE_VAL or 0?
if the mantissa becomes too precise to represent correctly, the latter (least significant) bits get chopped off during the double operations in STATE_MANTISSA
Frameworks/Foundation/NSScanner.mm
Outdated
return StubReturn(); | ||
} | ||
- (BOOL)isAtEnd { | ||
@synchronized(self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of the other APIs are @synchronized
... should they be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NSScanner is in the list of thread-unsafe Foundation classes.
|
||
static const double c_floatingPtTolerance = 0.00001; | ||
|
||
// Helper template function that calls a scan_______ function specified by the selector and does the nececssary casts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nececssary
} | ||
|
||
TEST(NSScanner, ScanHexLongLong) { | ||
testScanHexIntegral<unsigned long long>(@selector(scanHexLongLong:)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see an additional test for something that fits in 64 but not 32; this doesn't have that
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
} else if (unicode >= 'a' && unicode <= 'f') { | ||
if (scannedPeriod) { | ||
mantissa += (mantissaDigitPlace /= 16) * (unicode - 'a' + 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the public struct NSDecimal to keep track of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be longer code, and we don't need NSDecimal's greater precision, since we only need exactly as much precision as would fit in a double's mantissa.
Frameworks/Foundation/NSScanner.mm
Outdated
if (result) { | ||
*result = static_cast<float>(doubleValue); | ||
} | ||
return YES; | ||
} | ||
|
||
/** | ||
@Status Interoperable | ||
@Notes | ||
*/ | ||
- (BOOL)scanUnsignedLongLong:(unsigned long long*)pValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: no implicit int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned
and long
are just modifiers, not actual types, so saying unsigned long long
makes the compiler infer the type which by default is int
. It's common to elide the int, but I'm not a fan 🐋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that. we as well as the docs forgo the int, so I'm leaving as is.
Frameworks/Foundation/NSScanner.mm
Outdated
} else { | ||
*valuep = long_long_MIN; | ||
*valuep = std::numeric_limits<long long>::min(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be lowest()
?
} | ||
|
||
TEST(NSScanner, ScanLongLong) { | ||
testScanSignedIntegral<long long>(@selector(scanLongLong:)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: they're everywhere 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
- (BOOL)scanHexInt:(unsigned int*)intValue; | ||
- (BOOL)scanHexLongLong:(unsigned long long*)result; | ||
- (BOOL)scanInteger:(NSInteger*)value; | ||
- (BOOL)scanInt:(int*)intValue; | ||
- (BOOL)scanLongLong:(long long*)longLongValue; | ||
- (BOOL)scanString:(NSString*)string intoString:(NSString* _Nullable*)stringValue; | ||
- (BOOL)scanString:(NSString* _Nonnull)string intoString:(NSString* _Nullable*)stringValue; | ||
- (BOOL)scanUnsignedLongLong:(unsigned long long*)unsignedLongLongValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: they're everywhere 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@@ -185,45 +150,43 @@ - (BOOL)scanInteger:(int*)valuep { | |||
} | |||
} | |||
|
|||
return TRUE; | |||
return YES; | |||
} | |||
|
|||
/** | |||
@Status Interoperable | |||
*/ | |||
- (BOOL)scanLongLong:(long long*)valuep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh. I am pretty sure this works, but why not use strtoll here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was not sure how we felt about swapping out large chunks of the previous code. there are also unichar -> char conversion implications I haven't thought about yet, but it should be fine I think...
Frameworks/Foundation/NSScanner.mm
Outdated
std::vector<char> charsToScan; | ||
charsToScan.reserve(_string.length - startLocation); | ||
for (NSUInteger i = startLocation; i < length; ++i, ++pScan) { | ||
if ((*pScan >= '0') && (*pScan <= '9')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(*pScan >= '0') && (*pScan <= '9' [](start = 12, length = 33)
isdigit?
Frameworks/Foundation/NSScanner.mm
Outdated
continue; | ||
} else { | ||
return FALSE; | ||
if (_charactersToBeSkipped) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_charactersToBeSkipped [](start = 8, length = 22)
This method is going to be particularly slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default case (skip whitespace, string is space separated) has low n, so it might not be as bad as you think. That said, is there an improvement you'd like to suggest? (Fetching more of the string at once?)
Frameworks/Foundation/NSScanner.mm
Outdated
} | ||
|
||
/** | ||
@Status Stub | ||
@Status Interoperable | ||
@Notes | ||
*/ | ||
- (BOOL)scanHexDouble:(double*)result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scanHexDouble [](start = 8, length = 13)
could sscanf_s be used for the heavylifting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sscanf doesn't return where it stopped scanning. you can sort of work around that by specifying "%a%s" as the format, but that scans the whole rest of the string unnecessarily... you could also do a bunch of preprocessing to give it a prepped state, but then it just ends up being the same, but with sscanf instead of ldexp
Frameworks/Foundation/NSScanner.mm
Outdated
@@ -174,7 +139,7 @@ - (BOOL)scanInteger:(int*)valuep { | |||
|
|||
// This assumes sizeof(long long) >= sizeof(int). | |||
if (![self scanLongLong:&scanValue]) { | |||
return FALSE; | |||
return NO; | |||
} else if (valuep) { | |||
if (scanValue > LONG_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LONG_MAX [](start = 24, length = 8)
NSIntegerMax, etc.
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
pScanStart += _location; | ||
char* pScan = (char*)[_string UTF8String] + startLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(char*)[_string UTF8String] + startLocation [](start = 18, length = 43)
This math won't work out well if there are non-ASCII characters in the string.
Frameworks/Foundation/NSScanner.mm
Outdated
// copy a scannable range of the string to a new buffer, adjusting for these constraints | ||
char decimalSeparator; | ||
if ((!_locale) || | ||
(![[_locale objectForKey:NSLocaleDecimalSeparator] getCString:&decimalSeparator maxLength:1 encoding:NSASCIIStringEncoding])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NSASCIIStringEncoding [](start = 109, length = 21)
In some locales the decimal separator won't be representable as an ASCII character.
658f544
to
03989dc
Compare
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
// Helper function for implementing a scan function for a numeric type using a CRT function such as wcstoll | ||
template <typename TNumeric> | ||
static BOOL __scanNumeric(NSScanner* self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: this could really be c++ bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, scanner instead of self?
Frameworks/Foundation/NSScanner.mm
Outdated
template <typename TNumeric> | ||
static BOOL __scanNumeric(NSScanner* self, | ||
TNumeric* valuep, | ||
TNumeric (*localeFunc)(const wchar_t*, wchar_t**, int, _locale_t), // eg: _wcstoll_l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need a locale function here, you could use lambda to bind the locale when you need it (at the calling site).
Frameworks/Foundation/NSScanner.mm
Outdated
_locale = nil; | ||
// Overload of __scanNumeric for CRT functions that do not take a base parameter | ||
template <typename TNumeric> | ||
static BOOL __scanNumeric(NSScanner* self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lambdas to bind base when needed?
Frameworks/Foundation/NSScanner.mm
Outdated
result = NO; | ||
break; | ||
const NSUInteger length = _string.length; | ||
NSUInteger i = [self _indexOfNextUnskippedCharacter]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could go inside the initializer part for the for statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could in this instance, but there are other instances in the file where i needs to be available outside the for loop and I'd rather keep them in line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. You can ignore any and all the nit comments.
Frameworks/Foundation/NSScanner.mm
Outdated
range.length = [string length]; | ||
int oldLocation = _location; | ||
- (BOOL)scanString:(NSString* _Nonnull)string intoString:(NSString* _Nullable*)stringp { | ||
if (string.length == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method uses string.length, and [string length], would be good to stick to one convention. also, may even make sense to stick it into a constant.
Frameworks/Foundation/NSScanner.mm
Outdated
state = STATE_P; | ||
|
||
} else { | ||
goto DONE_SCANNING_HEXDOUBLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the goto here, please set a flag (doneScanningDouble) and outside the switch, check for the flag and break out of the for loop. We don't like goto's for a reason and this is easy to not use it here.
Frameworks/Foundation/NSScanner.mm
Outdated
double mantissaDigitPlace = 1; | ||
|
||
const NSUInteger length = _string.length; | ||
NSUInteger i = [self _indexOfNextUnskippedCharacter]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, can be for initializer.
Frameworks/Foundation/NSScanner.mm
Outdated
const NSUInteger length = _string.length; | ||
NSUInteger i = [self _indexOfNextUnskippedCharacter]; | ||
for (; i < length; ++i) { | ||
unichar unicode = _stringChars[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ch or even c is more appropriate than unicode.
Frameworks/Foundation/NSScanner.mm
Outdated
goto DONE_SCANNING_HEXDOUBLE; // Stop scanning once a second period has been scanned | ||
} | ||
|
||
} else if (unicode >= '0' && unicode <= '9') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could extract the value of the current character first for digits and hex first and then do the scannedPeriod check. Better yet, you could have a multiplier value which is 16 before you scan period and 1/16 after you scannedPeriod, then you don't need the if checks at all. also, isdigit instead of actual comparison here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving the comparison since it better contextualizes the arithmetic and is more in line with the other checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if checks would still be necessary even using that multiplier value, since the mantissa advances in different directions before/after the period ie:
0x12, scan 3 = 0x12 * 16 + 0x3 = 0x123 (multiplier applies to previous number, is constant)
0x1.2, scan 3 = 0x1.2 + 0x3 / 16^2 = 0x1.23 (multiplier applies to new digit, changes)
I could combine scanned period and the multiplier into one var, but I think that would hurt readability.
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
switch (state) { | ||
case STATE_START: | ||
if (unicode == '-') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are meant to be UTF16 unichars, then need L'-' etc in front of the chars. Not sure if that is automatic.
Frameworks/Foundation/NSScanner.mm
Outdated
return _isCaseSensitive; | ||
+ (instancetype)localizedScannerWithString:(NSString* _Nonnull)string { | ||
NSScanner* ret = [self scannerWithString:string]; | ||
ret.locale = [NSLocale currentLocale]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do it via an init,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no equivalent public init, seems like unnecessary frills to make one just for this
Frameworks/Foundation/NSScanner.mm
Outdated
state = STATE_SIGN; | ||
mantissaIsNegative = true; | ||
} else if (unicode == '+') { | ||
state = STATE_SIGN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state = STATE_SIGN; [](start = 12, length = 27)
you could just use the NSDecimal struct and if the precision is not too high, it would only take up the first slot of the mantissa array.
Also it would make the code much simpler, with tracking all the states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frameworks/Foundation/NSScanner.mm
Outdated
case STATE_ZERO: | ||
if (unicode == 'x' || unicode == 'X') { | ||
state = STATE_MANTISSA; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else [](start = 18, length = 4)
nit: can just return NO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? the top if case would fall through...
Frameworks/Foundation/NSScanner.mm
Outdated
|
||
case STATE_P: | ||
if (!hasValue) { | ||
goto DONE_SCANNING_HEXDOUBLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goto [](start = 20, length = 4)
goto :(
03989dc
to
4b278f6
Compare
Ping |
|
||
// Helper function for implementing a scan function for a numeric type using a CRT function such as wcstoll | ||
template <typename TNumeric> | ||
static BOOL __scanNumeric(NSScanner* scanner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make templated on lambda so we don't have to use std::function, and then just inline the lambdas into the __scanNumeric calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather have it be strongly-typed - if it turns out we are bottlenecked by std::function overhead, we can make this change then
Frameworks/Foundation/NSScanner.mm
Outdated
std::function<long(const wchar_t*, wchar_t**)> scanFunc = [self](const wchar_t* scanStart, wchar_t** scanEnd) { | ||
return _crtLocale ? _wcstol_l(scanStart, scanEnd, 10, _crtLocale) : wcstol(scanStart, scanEnd, 10); | ||
}; | ||
return __scanNumeric(self, reinterpret_cast<long*>(valuep), scanFunc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may not be a valid cast on all platforms. prefer a temporary that you copy into NSInteger for truncation purposes
- Implemented the following functions/properties: - locale - copyWithZone: - localizedScannerWithString: - scanDecimal: - scanHexDouble: - scanHexFloat: - Added unit tests for NSScanner (there were none, including for previously implemented features) - Fixed a number of edge cases that were incorrect Fixes microsoft#2083
…g CRT functions - Changed NSScanner to prefetch all characters in the string at init - Added tests that check for differing behaviors between 32-bit and 64-bit types - Misc CR feedback
- Renamed the states in scanHexDouble to reflect the current expected state rather than previous state
4b278f6
to
5c9b489
Compare
Fixes #2083