-
Notifications
You must be signed in to change notification settings - Fork 62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I've added a few minor comments.
Additionally to those: I think it would be beneficial to translate the error messages. That was for example done here: https://github.com/common-voice/sentence-collector/blob/main/server/lib/validation/languages/ckb.js . We decided not to translate these error messages through the normal UI translation process, as that would require quite some work for not much benefit. However if somebody is adding sentences in a given language, it's likely they would understand a translated error message anyway. Translating those error messages would then enable contributors to contribute without knowing English. Do you think that would be a good improvement here?
regex: /[<>+*#@%^[\]()\/]/, | ||
error: "Sentence should not contain symbols", | ||
}, { | ||
// 7 or more repeating characters in a row is likely a non-formal spelling or difficult to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, did that happen often in Sentence Collector? 7 repeating characters seems like a lot, but I have absolutely no language experience apart from latin-based languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sometimes happens when people dump uncleaned sentences directly crawled from web. For examples sentences with long tailing dots, such as額........
Such sentences are most likely junk.
error: "Sentence should not contain emojis or other special Unicode symbols", | ||
}, { | ||
regex: /[\u5427](\s|$)/, | ||
error: 'Sentence should not end with Mandarin particles', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this message correct? This would also reject \u5427
followed by a space. Is that also considered ending a sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sometimes a space indicates a pause or the end of a sentence. I have amended this rule in the latest commit.
Thanks for reviewing! It is now fixed in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. One more small comment that I didn't catch before and then this can get merged :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This will be part of the next release. I can't say yet when that will be, but there will be a new comment once the release is done. |
Great, thank you so much! |
🎉 This PR is included in version 2.17.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
No description provided.