-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Date.parse
#289
Improve Date.parse
#289
Conversation
- produce readable output for Date objects in repl
- rewrite Date.parse() with separate parsers - return `NaN` for out of bounds field values as specified - accept up to 9 decimals for millisecond fraction but truncate at 3 - accept many more alternative date/time formats - add test cases in tests/test_builtin.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM but I believe the spec allows for some implementation-dependentness and we should decide whether we want to align with V8 (laxer) or SM (stricter.)
Although I don't generally subscribe to Postel's maxim, I'm inclined to side with V8 here because date strings are often entered by humans, not machines.
repl.js
Outdated
@@ -938,6 +938,8 @@ import * as os from "os"; | |||
std.puts(a); | |||
} else if (stack.indexOf(a) >= 0) { | |||
std.puts("[circular]"); | |||
} else if (a instanceof Date) { | |||
std.puts("Date " + a.toGMTString().__quote()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in my never-ceasing purge of everything and anything non-standard, this method had so far escaped my notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a toSource
, which albeit non-standard is available on some systems and less surprising. For the Date
object output, I suggest std.puts(`Date "${a.toGMTString()}"`);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not format the string here, in place? I'm not a huge fan of swapping one non-standard method for another. toSource() was a Mozilla-ism, as far as I'm aware, and even they removed it.
edit: doesn't need to hold up this pull request since __quote is a pre-existing thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not format the string here, in place?
I used std.puts(`Date "${a.toGMTString()}"`);
in the latest commit. Is this what you meant?
How to you produce the source for a string reliably without some kind of toSource()
method?
I guess I could write it in JS and add it in repl.js
or another package loaded by repl.js
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I mean, doing it inside repl.js.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it in Javascript, then as I proceeded to remove String.prototype.__quote
, I realized JSON.stringify
does the job. Patch submitted.
} | ||
|
||
/* parse toISOString format */ | ||
static BOOL js_date_parse_isostring(const uint8_t *sp, int fields[9], BOOL *is_local) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static BOOL js_date_parse_isostring(const uint8_t *sp, int fields[9], BOOL *is_local) { | |
static BOOL js_date_parse_isostring(const uint8_t *sp, int (*fields)[9], BOOL *is_local) { |
That way callers must pass in an exactly nine-element array (and you have to dereference them here as (*fields)[i]
)
Another alternative is C99-style int fields[static 9]
syntax. Then callers must pass in an at least nine-element array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I don't like the C99 syntax too much. Does MSAN or ASAN catch out of bound accesses for arrays passed as pointers (without a length specified by either syntax)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time. Not 100% of the time though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a macro to encapsulate the C99 syntax and used that. It is less surprising to casual readers than the array pointer trick, and also allows for larger arrays if necessary, as is the case for get_date_fields()
/ set_date_fields()
combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to disable the C99 syntax for MSVC that does not support it even though it does define __STDC_VERSION__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int (*fields)[9]
syntax works in msvc; it's a c89-ism that we use quite extensively in libuv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
int (*fields)[9]
syntax works in msvc; it's a c89-ism that we use quite extensively in libuv.
Of course, but this syntax would force the length to be 9
whereas get_date_fields
needs at least 9
and set_date_fields
at least 7
. It is not a problem if MSVC cannot verify the minimum length as other compilers already do for 99,9% of the code.
return FALSE; | ||
} | ||
} | ||
if (string_skip_char(sp, &p, 'T')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my other comment: V8 accepts lowercase 't', spidermonkey doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T
and Z
are specified in uppercase for the ISO date/time format. Matching these in lowercase too does not seem necessary, but could become a side effect of the string conversion if I were to uppercase the string unconditionally. I am not sure it is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally lean on the side of acceptance and forgiveness here but you're not wrong it's per spec so 🤷
quickjs.c
Outdated
} else | ||
if (c == '(') { /* skip parenthesized phrase */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else | |
if (c == '(') { /* skip parenthesized phrase */ | |
} else if (c == '(') { /* skip parenthesized phrase */ |
Also, I feel slightly offended that this list leaves out CET and CEST :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a matter of style... I like to align the if
tests at the indentation level. I will check if the rest of the code is consistent with that... else if
on the same line wins 8 to 1, yet for long lists of chained tests, I find it more readable to break the line between the else
and the if
.
I would gladly add more timezone abbreviations and use a table. Do V8 or SM support other ones?
I found this source: https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations but it is long and some of the abbreviations are ambiguous.
This site also has a trove of useful information: https://www.timeanddate.com/time/zones/
I wrote a more general parser and added the European zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do V8 or SM support other ones?
Both SM's and V8's parser go well above and beyond the minimum functionality the spec mandates.
They both know how to parse e.g. (Central European Standard Time)
(and in a case-insensitive manner); quite possibly they lean into ICU for the actual string parsing.
quickjs.c
Outdated
} else | ||
if (c == ')') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else | |
if (c == ')') { | |
} else if (c == ')') { |
- add `minimum_length` macro to specify argument array sizes - improve `string_get_milliseconds` readability - fix `upper_ascii` - add `js_tzabbr` and `string_get_tzabbr` to handle timezone abbreviations - v8.js: parse all environement variables and output them - v8.txt: add environement variables when specified - no longer use `String.prototype.__quote()` to output `Date` objects in repl
- disable C99 `minimum_length()` syntax for MSVC
v8.js
Outdated
let env = {}, envstr = ""; | ||
for (let s of source.matchAll(/environment variables:(.+)/ig)) { | ||
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) { | ||
env[m[1]] = m[2]; | ||
envstr += ` ${m[1]}=${m[2]}`; | ||
} | ||
} | ||
print(`=== ${file}${envstr}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let env = {}, envstr = ""; | |
for (let s of source.matchAll(/environment variables:(.+)/ig)) { | |
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) { | |
env[m[1]] = m[2]; | |
envstr += ` ${m[1]}=${m[2]}`; | |
} | |
} | |
print(`=== ${file}${envstr}`); | |
let env = {}, envstr = "" | |
for (let s of source.matchAll(/environment variables:(.+)/ig)) { | |
for (let m of s[1].matchAll(/\s*([\S]+)=([\S]+)/g)) { | |
env[m[1]] = m[2] | |
envstr += ` ${m[1]}=${m[2]}` | |
} | |
} | |
print(`=== ${file}${envstr}`) |
No semicolons in my beautiful pristine JS code, please :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for consistency, but not my style :)
@chqrlie is this ready to go? I'm planning on cutting a 0.4.0 release once this lands. |
Yes. just waiting for Ben's final approval |
- use JSON.stringify to output Date and string values in repl.js - remove String.prototype.__quote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only briefly skimmed it this time because, well, it's 10 AM on a Sunday morning and I want to go outside and play with the kids, but LGTM :)
Improve Date.parse
NaN
for out of bounds field values as specified