Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(date): Add Intl-Based Tests for Date Definitions to Ensure Completeness of Weekdays and Months #2912

Open
wants to merge 27 commits into
base: next
Choose a base branch
from

Conversation

sossost
Copy link

@sossost sossost commented May 18, 2024

#2904
This PR is about Intl-based tests for the date definitions in Faker.js. These tests ensure that the weekdays and months are correctly defined and returned across different locales, enhancing the accuracy and completeness of the data.

@sossost sossost requested a review from a team as a code owner May 18, 2024 14:17
Copy link

netlify bot commented May 18, 2024

Deploy Preview for fakerjs ready!

Name Link
🔨 Latest commit 78c7183
🔍 Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/667202f079e1040008bd04f0
😎 Deploy Preview https://deploy-preview-2912.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented May 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.95%. Comparing base (80d5aad) to head (78c7183).

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #2912      +/-   ##
==========================================
- Coverage   99.95%   99.95%   -0.01%     
==========================================
  Files        2987     3051      +64     
  Lines      216052   217787    +1735     
  Branches      603      948     +345     
==========================================
+ Hits       215952   217686    +1734     
- Misses        100      101       +1     
Files Coverage Δ
src/locales/af_ZA/date/index.ts 100.00% <100.00%> (ø)
src/locales/af_ZA/date/month.ts 100.00% <100.00%> (ø)
src/locales/af_ZA/date/weekday.ts 100.00% <100.00%> (ø)
src/locales/af_ZA/index.ts 100.00% <100.00%> (ø)
src/locales/ar/date/month.ts 100.00% <100.00%> (ø)
src/locales/ar/date/weekday.ts 100.00% <100.00%> (ø)
src/locales/az/date/month.ts 100.00% <100.00%> (ø)
src/locales/az/date/weekday.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/date/month.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/date/weekday.ts 100.00% <100.00%> (ø)
... and 143 more

... and 1 file with indirect coverage changes

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

Is this PR ready for review yet?
Because I dont see Intl in the tests anywhere.

@ST-DDT ST-DDT added p: 1-normal Nothing urgent c: test m: date Something is referring to the date module labels May 18, 2024
@ST-DDT ST-DDT linked an issue May 18, 2024 that may be closed by this pull request
@ST-DDT ST-DDT added this to the vAnytime milestone May 18, 2024
@sossost
Copy link
Author

sossost commented May 18, 2024

@ST-DDT
If I create tests with JavaScript Intl.DateTimeForamt, most of them will not pass.
Did I proceed correctly??

     it('should use Intl.DateTimeFormat to get the month name in the correct locale', () => {
          const date = new Date(2020, 0, 1);
          const intlMonth = new Intl.DateTimeFormat(locale, {
            month: 'long',
          }).format(date);
          expect(localizedFaker.definitions.date.month.wide).toContain(
            intlMonth
          );
      });

      it('should use Intl.DateTimeFormat to get the abbreviated month name in the correct locale', () => {
          const date = new Date(2020, 0, 1);
          const intlMonth = new Intl.DateTimeFormat(locale, {
            month: 'short',
          }).format(date);
          expect(localizedFaker.definitions.date.month.abbr).toContain(
            intlMonth
          );
      });

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

I'm currently on mobile and thus cannot run the tests myself.
The tests look good to me (although all 12 months need to be checked).
Also I'm not sure whether the current setting returns only the month or the full date string, for that I need some of the failure messages.

@sossost
Copy link
Author

sossost commented May 18, 2024

@ST-DDT
These are some failure messages.

AssertionError: expected [ 'Luni', 'Marți', 'Miercuri', 'Joi', 'Vineri', 'Sâmbătă', 'Duminică' ] to include 'sâmbătă'
AssertionError: expected [ 'нед', 'пон', 'вто', 'сре', 'чет', 'пет', 'саб' ] to include 'саб.'
AssertionError: expected [ 'Hé', 'Ke', 'Sze', 'Csüt', 'Pé', 'Szo', 'Va' ] to include 'V'

Failure factors can range from missing data to capitalization, dots, etc.

   describe.each(Object.entries(allLocales))(
          'for locale %s',
          (locale, fakerLocale) => {
            let localizedFaker: Faker;
            const weekdays = Array.from(
              { length: 7 },
              (_, i) => new Date(2020, 0, i + 4)
            ); // January 4-10, 2020 are Sunday to Saturday

            beforeAll(() => {
              localizedFaker = new Faker({ locale: [fakerLocale, en, base] });
            });

            it(`should return random value from date.weekday.wide_context array for context option in ${locale}`, () => {
              if (localizedFaker.definitions.date.month.wide_context) {
                const weekday = localizedFaker.date.weekday({ context: true });
                expect(
                  localizedFaker.definitions.date.weekday.wide_context
                ).toContain(weekday);
              }
            });

            it(`should return random value from date.weekday.abbr_context array for abbreviated and context option in ${locale}`, () => {
              if (localizedFaker.definitions.date.month.abbr_context) {
                const weekday = localizedFaker.date.weekday({
                  abbreviated: true,
                  context: true,
                });
                expect(
                  localizedFaker.definitions.date.weekday.abbr_context
                ).toContain(weekday);
              }
            });

            it('should use Intl.DateTimeFormat to get the weekday name in the correct locale', () => {
              for (const date of weekdays) {
                const intlWeekday = new Intl.DateTimeFormat(locale, {
                  weekday: 'long',
                }).format(date);
                expect(localizedFaker.definitions.date.weekday.wide).toContain(
                  intlWeekday
                );
              }
            });

            it('should use Intl.DateTimeFormat to get the abbreviated weekday name in the correct locale', () => {
              for (const date of weekdays) {
                const intlWeekday = new Intl.DateTimeFormat(locale, {
                  weekday: 'short',
                }).format(date);
                expect(localizedFaker.definitions.date.weekday.abbr).toContain(
                  intlWeekday
                );
              }
            });
          }
        );
        

This is my current test code.

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

Please note: We want the test to ensure that we have the correct values, currently the values might be invalid.

AssertionError: expected [ 'Luni', 'Marți', 'Miercuri', 'Joi', 'Vineri', 'Sâmbătă', 'Duminică' ] to include 'sâmbătă'

If I insert sâmbătă or Sâmbătă into google translate and translate it back and forth, then I get the lowercase variant.

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

Could you please fix the invalid values and replace it with correct ones?

Maybe using a onetime script like:

for [locale] in allLocales {
  const weekdayPath = resolve("src/locales/${locale}/date/weekday.ts");
  const storedWeekdays = await import(weekdayPath) as DateDefinition["weekday"];
  const long = [];
  const abbr = [];
  for (const i=0;i<7;i++) {
    long.push(new Intl...);
    abbr.push(new Intl....);
  }
  storedWeekdays.long = long;
  storedWeekdays.abbr = abbr;
  writeFileSync(weekdayPath, "export default "+JSON.toString(storedWeekdays));
}

And run pnpm run preflight afterwards.

EDIT: Please post the script here, so that we can use it during the review.

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

Please merge next into your branch to avoid merge conflicts, because we recently merged #2902.

@ST-DDT
Copy link
Member

ST-DDT commented May 18, 2024

EDIT: Please post the script here, so that we can use it during the review.

If you post the script here, we can later integrate it in our generate locale script, so that all locales always have the correct values.

Or if you want to do that directly you can add it after generateLocaleFile:

promises.push(
// src/locale/<locale>.ts
// eslint-disable-next-line unicorn/prefer-top-level-await -- Disabled for performance
generateLocaleFile(locale),
// src/locales/**/index.ts
// eslint-disable-next-line unicorn/prefer-top-level-await -- Disabled for performance
generateRecursiveModuleIndexes(pathModules, locale, 'LocaleDefinition', 1)
);

@sossost
Copy link
Author

sossost commented May 19, 2024

@ST-DDT
When I run the script that creates the files in the date folder, all tests now pass.
However, there are a few things to check.
The data previously entered for each locale is also If Intl returns English, it will be converted to English.
Also, the 'vd' code was returned by Intl in my local language, so I added an exception condition.

I am a novice developer, so please understand my embarrassing generator code.

@matthewmayer
Copy link
Contributor

Rather than just overwriting the existing data, could we output a table showing where the data produced by Intl differs from what we already have in Faker? That would make it easier to see discrepancies, and loop in native speakers if needed to see why.

@sossost
Copy link
Author

sossost commented May 19, 2024

@matthewmayer
That seems like a good idea. Can someone please help?? I have other things to do, so it might take some time.

@matthewmayer
Copy link
Contributor

matthewmayer commented May 19, 2024

Here are the discrepancies i could find, for "wide" data (the abbr/short data has more variation), ignoring case-only differences and ignoring data where Intl doesnt seem to provide correct data eg dv and sr_RS_latin.

data type locale Faker Intl
weekday wide fa پتچشنبه,جمعه,چهارشنبه,دوشنبه,سه شنبه,شنبه,یکشنبه پنجشنبه,جمعه,چهارشنبه,دوشنبه,سه‌شنبه,شنبه,یکشنبه
weekday wide he יום חמישי,יום ראשון,יום רביעי,יום שישי,יום שלישי,יום שני,שבת יום חמישי,יום ראשון,יום רביעי,יום שבת,יום שישי,יום שלישי,יום שני
weekday wide pt_BR Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira
weekday wide pt_PT Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira
weekday wide ur اتور,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ اتوار,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ
weekday wide zh_CN 星期二,星期六,星期三,星期四,星期天,星期五,星期一 星期二,星期六,星期日,星期三,星期四,星期五,星期一
weekday wide zh_TW 星期一,星期二,星期三,星期五,星期六,星期天,星期四 星期一,星期二,星期三,星期五,星期六,星期日,星期四
month wide ar آب,آذَار,أَيَّار,أَيْلُول,تِشْرِين ٱلْأَوَّل,تِشْرِين ٱلثَّانِي,تَمُّوز,حَزِيرَان,شُبَاط,كَانُون ٱلْأَوَّل,كَانُون ٱلثَّانِي,نَيْسَان أبريل,أغسطس,أكتوبر,ديسمبر,سبتمبر,فبراير,مارس,مايو,نوفمبر,يناير,يوليو,يونيو
month wide ur اپریل,اکتوبر,اگست,جنوری,جولائ,جون,دسمبر,ستمبر,فروری,مارچ,مئ,نومبر اپریل,اکتوبر,اگست,جنوری,جولائی,جون,دسمبر,ستمبر,فروری,مارچ,مئی,نومبر
month wide uz_UZ_latin Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktyabr,Sentyabr,Yanvar Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktabr,Sentabr,Yanvar
month wide vi Tháng Ba,Tháng Bảy,Tháng Chín,Tháng Giêng,Tháng Hai,Tháng Mười,Tháng Mười Hai,Tháng Mười Một,Tháng Năm,Tháng Sáu,Tháng Tám,Tháng Tư Tháng 1,Tháng 10,Tháng 11,Tháng 12,Tháng 2,Tháng 3,Tháng 4,Tháng 5,Tháng 6,Tháng 7,Tháng 8,Tháng 9

i think you really need a language speaker to check these manually. For example, in Chinese which I do know, both 星期天 and 星期日 are both perfectly valid words for "Sunday".

In Vietnamese it looks like we are spelling out the month numbers versus using digits.

In Uzbek there seems to be some debate about the correct transliteration for September and October - see https://daryo.uz/2018/01/15/sentabr-yoziladimi-yoki-sentyabr , perhaps @Mirazyzz can help clarify!

test/modules/date.spec.ts Outdated Show resolved Hide resolved
test/modules/date.spec.ts Outdated Show resolved Hide resolved
test/modules/date.spec.ts Outdated Show resolved Hide resolved
test/modules/date.spec.ts Outdated Show resolved Hide resolved
test/modules/date.spec.ts Outdated Show resolved Hide resolved
scripts/generate-date-for-locale.ts Outdated Show resolved Hide resolved
scripts/generate-date-for-locale.ts Outdated Show resolved Hide resolved
scripts/generate-date-for-locale.ts Outdated Show resolved Hide resolved
scripts/generate-date-for-locale.ts Outdated Show resolved Hide resolved
scripts/generate-date-for-locale.ts Outdated Show resolved Hide resolved
@Mirazyzz
Copy link
Contributor

Here are the discrepancies i could find, for "wide" data (the abbr/short data has more variation), ignoring case-only differences and ignoring data where Intl doesnt seem to provide correct data eg dv and sr_RS_latin.

data type locale Faker Intl
weekday wide fa پتچشنبه,جمعه,چهارشنبه,دوشنبه,سه شنبه,شنبه,یکشنبه پنجشنبه,جمعه,چهارشنبه,دوشنبه,سه‌شنبه,شنبه,یکشنبه
weekday wide he יום חמישי,יום ראשון,יום רביעי,יום שישי,יום שלישי,יום שני,שבת יום חמישי,יום ראשון,יום רביעי,יום שבת,יום שישי,יום שלישי,יום שני
weekday wide pt_BR Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira
weekday wide pt_PT Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira
weekday wide ur اتور,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ اتوار,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ
weekday wide zh_CN 星期二,星期六,星期三,星期四,星期天,星期五,星期一 星期二,星期六,星期日,星期三,星期四,星期五,星期一
weekday wide zh_TW 星期一,星期二,星期三,星期五,星期六,星期天,星期四 星期一,星期二,星期三,星期五,星期六,星期日,星期四
month wide ar آب,آذَار,أَيَّار,أَيْلُول,تِشْرِين ٱلْأَوَّل,تِشْرِين ٱلثَّانِي,تَمُّوز,حَزِيرَان,شُبَاط,كَانُون ٱلْأَوَّل,كَانُون ٱلثَّانِي,نَيْسَان أبريل,أغسطس,أكتوبر,ديسمبر,سبتمبر,فبراير,مارس,مايو,نوفمبر,يناير,يوليو,يونيو
month wide ur اپریل,اکتوبر,اگست,جنوری,جولائ,جون,دسمبر,ستمبر,فروری,مارچ,مئ,نومبر اپریل,اکتوبر,اگست,جنوری,جولائی,جون,دسمبر,ستمبر,فروری,مارچ,مئی,نومبر
month wide uz_UZ_latin Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktyabr,Sentyabr,Yanvar Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktabr,Sentabr,Yanvar
month wide vi Tháng Ba,Tháng Bảy,Tháng Chín,Tháng Giêng,Tháng Hai,Tháng Mười,Tháng Mười Hai,Tháng Mười Một,Tháng Năm,Tháng Sáu,Tháng Tám,Tháng Tư Tháng 1,Tháng 10,Tháng 11,Tháng 12,Tháng 2,Tháng 3,Tháng 4,Tháng 5,Tháng 6,Tháng 7,Tháng 8,Tháng 9
i think you really need a language speaker to check these manually. For example, in Chinese which I do know, both 星期天 and 星期日 are both perfectly valid words for "Sunday".

In Vietnamese it looks like we are spelling out the month numbers versus using digits.

In Uzbek there seems to be some debate about the correct transliteration for September and October - see https://daryo.uz/2018/01/15/sentabr-yoziladimi-yoki-sentyabr , perhaps @Mirazyzz can help clarify!

Month names are derived from Russian, and therefore IMO the proper usage would be "sentYAbr" to reflect the Russian "сентЯбрь." Since there is no direct equivalent of the letter "Я" in our language, we use the combination of "Y" and "A." However, the debate on this matter is still ongoing, and no official approach has been decided yet. Interestingly, some official school books use both variations on different pages.

Given this uncertainty, I suggest we choose the variation that minimizes our workload and simplifies our process.

@sossost
Copy link
Author

sossost commented May 21, 2024

@ST-DDT No! thank you Actually, this is a bit of a difficult issue for me because I still lack a lot. Thank you very much for your direct and indirect help.

@sossost
Copy link
Author

sossost commented May 22, 2024

  • Creates unnecessary definitions for all descendant locales, e.g. en_AU, en_GB, en_HK when these should probably just inherit from en.
  • Creates definitions for base locale, which it shouldn't

Currently, it does not because Intl uses de-de and we use de_DE. This might have to be changed and then filtered for redundancy though.

@sossost Could you please fix that? (Please pull before changing any code)

  • Doesn't make seperate arrays for wide and wide_context or abbr and abbr_context in locales where these are different e.g. fr or ru - see documentation of "context" parameter at fakerjs.dev/api/date#month

I'm not sure whether I get you correctly here. The context variants arent touched or generated because AFAIK there is no Intl way to do that.

Sorry I didn't understand exactly what to do. Can you give me more hints??

@ST-DDT
Copy link
Member

ST-DDT commented May 22, 2024

Sorry I didn't understand exactly what to do. Can you give me more hints??

  1. Replace all
new Intl.DateTimeFormat({ locale, ...});

With

new Intl.DateTimeFormat({ locale: locale.replaceAll('_', '-'), ...});
  1. Find all instances where locale.date.weekday/month is the same as the parent locale. E.g. en is the parent of en_US and thus the locale files would be identical so the later should be skipped/not created. Some locales such as uz_UZ_latin dont have parents so they need the additional files.

Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help. Please let us know, if this is getting to complicated/time consuming for you.

scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
scripts/generate-locales.ts Outdated Show resolved Hide resolved
@matthewmayer
Copy link
Contributor

With a big thanks to @sosost for their work so far, but I'm starting to feel trying to do this as a script that generates everything at once is maybe a mistake.

Instead perhaps we can modify it so it's run manually and you give it a single locale as a parameter, and it attempts to generate the days and months for that locale only.

That way we don't need to code in lots of edge cases, and it's easy to run it for missing locales, and review the diff for one locale at a time.

Once generated, it's not like the days of the week or months of the year will ever change ...

@ST-DDT
Copy link
Member

ST-DDT commented Jun 12, 2024

Sorry, for the delay. I will tackle this once I have some free time.

@ST-DDT ST-DDT self-assigned this Jun 13, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Jun 18, 2024

Ready for review.

The locales can be more easily reviewed using:

rm -rdf src/locales
git reset next -- src/locales
git restore src/locales

pnpm run preflight

git add src/locales

@ST-DDT ST-DDT requested review from matthewmayer, xDivisionByZerox and a team June 18, 2024 22:04
@ST-DDT ST-DDT added the c: infra Changes to our infrastructure or project setup label Jun 18, 2024
Copy link
Contributor

@matthewmayer matthewmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this should be part of preflight

As we learned in #2906 ICU data is not stable so we could easily have the situation where this generate different dates on different versions of node.

I think it'll be better to have this as a separate standalone script that can be run on demand to check the data and fill in missing locales, but any discrepancies in existing locales should be checked by a native speaker.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 19, 2024

Added to the meeting notes for discussion.

@matthewmayer
Copy link
Contributor

I tested with Node 20.0.0 and 22.2.0. There are differences in data for en-AU, eo, es-MX, mk, uk and vi. Mostly minor things like tweaks to the shortened versions, capitalization etc. But enough to cause havoc with people running preflight with different node versions.

@ST-DDT ST-DDT added the s: needs decision Needs team/maintainer decision label Jun 19, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Jun 20, 2024

Team Notice

We talked about it in the team meeting, but we haven't been able to decide anything yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: infra Changes to our infrastructure or project setup c: test m: date Something is referring to the date module p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Intl based tests for date definitions
6 participants