Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dates): improve date string parsing #1646

Open
wants to merge 5 commits into
base: v4
Choose a base branch
from

Conversation

baodrate
Copy link
Contributor

A set of improvements related to the parsing of date/datetime values.

Changes:

  • Use Luxon to parse date/datetime strings.
    • This avoids the Date.parse's inconsistency between date-only (assumed UTC) and datetime (assumed local timezone) strings. (closes Incorrect timezone conversion for 'date' frontmatter #1615)
    • It also allows the date string's timezone to be carried along with the DateTime object, producing more friendly and semantically-correct timestamps.
  • CreatedModifiedDate plugin:
    • Make date handling more consistent such that file dates are optional everywhere, i.e. dates are not rendered unless the configured date type was sourced
  • ContentIndex plugin:
    • Make rss feed and sitemap use the appropriate date type: published and modified, respectively
      • the ordering of pages in the sitemap and rss feed will change
  • Config changes:
    • Allow configurable default timezone
    • Allow the user to set fallback values for the defaultDateType setting
      • default changed

Some slight refactoring was done along the way to make type annotations more consistent and type-checking more useful, but effort was made to avoid any changes to how quartz behaves or the structure of the files that are emitted (except where otherwise noted)

Apologies for the large PR. I tried to keep it as small as possible but a lot fo things touch the dates and I didn't want to leave the codebase in a regressed state between PRs. (for ex, the multiple change to defaultDateType is to compensate for the change to the handling of missing dates). Each commit should make sense on its own, if that helps for reviewing. But I can split this PR into multiple ones if requested

Use Luxon to parse date/datetime strings.

This avoids the `Date.parse`'s inconsistency between date-only (assumed
UTC) and datetime (assumed local timezone) strings. (closes jackyzha0#1615)

It also allows the date string's timezone to be carried along with the
DateTime object, producing more friendly and semantically-correct
timestamps.
@saberzero1
Copy link
Collaborator

Hey, thanks for this PR.

I have a few questions. I'll leave a detailed review of the code changes later today.

  • If I understand correctly, dates are hidden in the content meta instead of displaying the file or git date if no created/modified/published frontmatter key is present?
  • Several places use datetime instead of date in your proposed changes. Does this impact any displayed dates on the generated sites? (Like a date + time in content meta instead of date.)
  • Does the implemented solution still work if the build timezone and deploy timezone differ? If not, does setting the timezone in the config resolve this?

@baodrate
Copy link
Contributor Author

If I understand correctly, dates are hidden in the content meta instead of displaying the file or git date if no created/modified/published frontmatter key is present?

Dates are hidden in the content meta if the file does not have them. This has not changed. In both the main branch (v4) and this PR, the <p class="meta"> element will be empty if file.dates is empty. This is the case if you disable the CreatedModifiedDate plugin, or if you set CreatedModifiedDate's Options.priority to []

Prior to this PR, CreatedModifiedDate sets created/modified/published all to Date.now() if not found in any of the configured sources. This PR changes this behavior so that file.dates.published defaults to undefined. In other words, the date will be null if it is missing from your configured sources. The behavior of prioritizing date sources based on CreatedModifiedDate's Options.priority has not changed

Several places use datetime instead of date in your proposed changes. Does this impact any displayed dates on the generated sites? (Like a date + time in content meta instead of date.)

The actual rendering of visible dates in the pages is done in quartz/components/Date.tsx. the only changes to the visible output is:

-export function formatDate(d: Date, locale: ValidLocale = "en-US"): string {
-  return d.toLocaleDateString(locale, {
-    year: "numeric",
-    month: "short",
-    day: "2-digit",
-  })
+export function formatDate(d: DateTime, locale: ValidLocale = "en-US"): string {
+  return d.toLocaleString(
+    {
+      year: "numeric",
+      month: "short",
+      day: "2-digit",
+    },
+    { locale: locale },
+  )
 }
 
 export function Date({ date, locale }: Props) {
-  return <time datetime={date.toISOString()}>{formatDate(date, locale)}</time>
+  return <time datetime={date.toISO() || ""}>{formatDate(date, locale)}</time>
 }

The output of the Luxon's DateTime.toLocaleString() takes the same Intl.DateTimeFormat format options as Date.toLocaleDateString(), and produces the same output. So nothing has changed here.

Furthermore, javascript's Date object actually represents "a single moment in time in a platform-independent format" (and internally is just a epoch timestamp), so it is already the equivalent of a "date-time" object, it represents a date-time rather than a calendar date

Does the implemented solution still work if the build timezone and deploy timezone differ? If not, does setting the timezone in the config resolve this?

The way quartz currently works, the timestamp is hard-coded into the HTML of the page, so the only thing that matters is the timezone of the system at build-time, and the default timezone from the config.

given these options in quartz.config.ts:

...
locale: "de-DE",
timezone: "Europe/Berlin",
defaultDateType: "modified",
...
Frontmatter updated Before PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-08T19:11:00.000Z">08. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T07:11:00.000Z">09. Sept. 2024</time>
Frontmatter updated After PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-09T02:11:00.000+07:00">09. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T02:11:00.000+02:00">09. Sept. 2024</time>

With a system timezone of America/Chicago and quartz.config.ts:

...
locale: "de-DE",
timezone: undefined,
defaultDateType: "modified",
...
Frontmatter updated After PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-09T02:11:00.000+07:00">09. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T02:11:00.000-05:00">09. Sept. 2024</time>

tl;dr: With Luxon, the timezone from the datetime has the timezone of the date string. If the datetime string does not have a timezone, it is assumed to belong to the configured timezone setting, or (if not set) the build system's local time zone

@saberzero1
Copy link
Collaborator

If I understand correctly, dates are hidden in the content meta instead of displaying the file or git date if no created/modified/published frontmatter key is present?

Dates are hidden in the content meta if the file does not have them. This has not changed. In both the main branch (v4) and this PR, the <p class="meta"> element will be empty if file.dates is empty. This is the case if you disable the CreatedModifiedDate plugin, or if you set CreatedModifiedDate's Options.priority to []

Quartz's default priority is frontmatter, git, filesystem. Filesystem is always present, usually resulting in the deploy date. By default a date will be displayed, even if a frontmatter key is not present.

Therefore, by default, we expect a date to be visible.

Prior to this PR, CreatedModifiedDate sets created/modified/published all to Date.now() if not found in any of the configured sources. This PR changes this behavior so that file.dates.published defaults to undefined. In other words, the date will be null if it is missing from your configured sources. The behavior of prioritizing date sources based on CreatedModifiedDate's Options.priority has not changed

This is a change in default behavior, as default configuration always has a date (filesystem)

Several places use datetime instead of date in your proposed changes. Does this impact any displayed dates on the generated sites? (Like a date + time in content meta instead of date.)

The actual rendering of visible dates in the pages is done in quartz/components/Date.tsx. the only changes to the visible output is:

-export function formatDate(d: Date, locale: ValidLocale = "en-US"): string {
-  return d.toLocaleDateString(locale, {
-    year: "numeric",
-    month: "short",
-    day: "2-digit",
-  })
+export function formatDate(d: DateTime, locale: ValidLocale = "en-US"): string {
+  return d.toLocaleString(
+    {
+      year: "numeric",
+      month: "short",
+      day: "2-digit",
+    },
+    { locale: locale },
+  )
 }
 
 export function Date({ date, locale }: Props) {
-  return <time datetime={date.toISOString()}>{formatDate(date, locale)}</time>
+  return <time datetime={date.toISO() || ""}>{formatDate(date, locale)}</time>
 }

The output of the Luxon's DateTime.toLocaleString() takes the same Intl.DateTimeFormat format options as Date.toLocaleDateString(), and produces the same output. So nothing has changed here.

Furthermore, javascript's Date object actually represents "a single moment in time in a platform-independent format" (and internally is just a epoch timestamp), so it is already the equivalent of a "date-time" object, it represents a date-time rather than a calendar date

Does the implemented solution still work if the build timezone and deploy timezone differ? If not, does setting the timezone in the config resolve this?

The way quartz currently works, the timestamp is hard-coded into the HTML of the page, so the only thing that matters is the timezone of the system at build-time, and the default timezone from the config.

given these options in quartz.config.ts:

...
locale: "de-DE",
timezone: "Europe/Berlin",
defaultDateType: "modified",
...
Frontmatter updated Before PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-08T19:11:00.000Z">08. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T07:11:00.000Z">09. Sept. 2024</time>
Frontmatter updated After PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-09T02:11:00.000+07:00">09. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T02:11:00.000+02:00">09. Sept. 2024</time>

With a system timezone of America/Chicago and quartz.config.ts:

...
locale: "de-DE",
timezone: undefined,
defaultDateType: "modified",
...
Frontmatter updated After PR
2024-09-09T02:11:00+07:00 <time datetime="2024-09-09T02:11:00.000+07:00">09. Sept. 2024</time>
2024-09-09T02:11:00 <time datetime="2024-09-09T02:11:00.000-05:00">09. Sept. 2024</time>

If I understand correctly, the locale is used for the display format, and the timezone for the actual value?

tl;dr: With Luxon, the timezone from the datetime has the timezone of the date string. If the datetime string does not have a timezone, it is assumed to belong to the configured timezone setting, or (if not set) the build system's local time zone

Thanks for detailed explaination. I'll get back on this during the weekend.

@baodrate baodrate marked this pull request as draft December 13, 2024 12:03
@baodrate baodrate force-pushed the date-parsing branch 2 times, most recently from 6cf3b40 to acef63a Compare December 13, 2024 12:53
@baodrate
Copy link
Contributor Author

Therefore, by default, we expect a date to be visible.

Yes. It's a sensible default. That particular change (re: missing dates) was a very slight adjustment to make the edge cases a little more consistent. It shouldn't affect the regular use case at all

This is a change in default behavior, as default configuration always has a date (filesystem)

It is a change in the meaning of (e.g.) file.dates.created in the code, but it is not a change in user-observable behavior, since the default configuration still always returns a created date. The change is only observable if the user disables the filesystem source from their CreatedModifiedDate config

If I understand correctly, the locale is used for the display format, and the timezone for the actual value?

This is accurate, but to avoid any ambiguity:

unchanged behavior: The locale setting affects only the format of the displayed date on the page. It does not change the value of that date. (2024-09-09T02:11:00+07:00) will continue to display as "Sep 09, 2024" translated to the user's locale (e.g. 2024年9月09日 if your locale is zh)

changed behavior: The timezone setting affects the interpretation of non-timezone datetime strings. i.e., the exact seconds-since-epoch they represent. for example:

  • before: 2024-09-09T02:11:00 or 2024-09-09 were interpreted to mean "Sep 9th, 2024 00:00 in UTC" and "Sep 9th, 2024 00:00 in the local timezone", respectively
  • now: they are both interpreted to mean "in the local timezone"
    • unless QuartzConfig.configuration.timezone is set, where it will be used instead
    • (i.e., if a user does not explicitly write a timezone, we assume they mean their local time, whether or not they include the time portion)

Hopefully this truth table is more clear than my previous one. This one assumes your local timezone is America/Sao_Paulo and your locale is en-CA. tz here stands for your quartz config's timezone setting

.toLocaleString() is used to format the human-readable dates in quartz and toISO is used in the <time datetime=...> attribute and in the machine-readable outputs (e.g. sitemap)

  • 2014-01-01T00:00:00+09:00
type .toLocaleString .toISO
Date 2013-12-31, 13:00 GMT-02:00 2013-12-31T15:00:00.000Z
DateTime 2013-12-31, 13:00 GMT-02:00 2013-12-31T13:00:00.000-02:00
DateTime w/ tz: 'UTC+3' 2013-12-31, 18:00 GMT+03:00 2013-12-31T13:00:00.000-02:00
  • 2014-01-01T00:00:00
type .toLocaleString .toISO
Date 2014-01-01, 04:00 GMT-02:00 2014-01-01T06:00:00.000Z
DateTime 2014-01-01, 00:00 GMT-02:00 2014-01-01T00:00:00.000-02:00
DateTime w/ tz: 'UTC+3' 2014-01-01, 05:00 GMT+03:00 2014-01-01T00:00:00.000-02:00
  • 2014-01-01
type .toLocaleString .toISO
Date 2013-12-31, 22:00 GMT-02:00 2014-01-01T00:00:00.000Z
DateTime 2014-01-01, 00:00 GMT-02:00 2014-01-01T00:00:00.000-02:00
DateTime w/ tz: 'UTC+3' 2014-01-01, 05:00 GMT+03:00 2014-01-01T00:00:00.000-02:00

Make date handling more consistent such that file dates are optional
everywhere, i.e. dates are not rendered unless the CreatedModifiedDate
plugin sourced the configured date type.
Explicitly define Content Index types to improve type checking.

Make rss feed and sitemap use the appropriate date type:
published and modified, respectively.
Allows the user to set fallback values for the `defaultDateType` setting

Prior to this commit, if the date was not set, it would default to the
current time. e.g. if using `defaultDateType == "published"` or if
the CreatedModifiedDate plugin's `priority` setting is set without
`filesystem`
@baodrate baodrate marked this pull request as ready for review December 16, 2024 09:33
@aarnphm
Copy link
Collaborator

aarnphm commented Dec 17, 2024

I might be super wrong and ignorant here, but would it makes sense to infer timezone from locale alone by default?

User can still customize timezone if needed, my worry is that introducing another timezone field into the config might confuse user.

Comment on lines -51 to -55
created ||= file.data.frontmatter.date as MaybeDate
modified ||= file.data.frontmatter.lastmod as MaybeDate
modified ||= file.data.frontmatter.updated as MaybeDate
modified ||= file.data.frontmatter["last-modified"] as MaybeDate
published ||= file.data.frontmatter.publishDate as MaybeDate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh we should dealias all of the frontmatter stuff on the frontmatter transformers instead of doing it here.

Let me submit a quick PR for this.

Comment on lines +71 to +72
created ||= DateTime.fromMillis(st.birthtimeMs || Math.min(st.ctimeMs, st.mtimeMs))
modified ||= DateTime.fromMillis(st.mtimeMs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would probably add a bit of performance regression in terms of parsing time.

Can you run a quick bench comparing before and after for build time?

@saberzero1
Copy link
Collaborator

I might be super wrong and ignorant here, but would it makes sense to infer timezone from locale alone by default?

User can still customize timezone if needed, my worry is that introducing another timezone field into the config might confuse user.

Just my two cents, but most users use the default en-US because they prefer English components (and it is the default.)

I feel like it might result in the same confusion as at the moment. Perhaps we should consider only using locale if timezone is missing (with a default to system).

Also consider that locales can easily span several timezones.

@aarnphm
Copy link
Collaborator

aarnphm commented Dec 17, 2024

I feel like it might result in the same confusion as at the moment. Perhaps we should consider only using locale if timezone is missing (with a default to system).

fwiw we can derive timezone from "build" machine, or have a default tz set like shown.

but yeah the cartesian of all possible solution might be confusing i afraid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect timezone conversion for 'date' frontmatter
3 participants