-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'lenient' mode for hostname validation. #122
Conversation
a6a3f1b
to
2c29345
Compare
Good thinking and thank you! 🙂 👍
Also, I wonder if that would make a difference for the @brave-intl team (@NejcZdovc and @mrose17 mostly as they depend on it in bat-publisher). |
Regarding the settings, I was thinking that since we already have a way to specify it via the By lenient I meant more permissive. Which means that by default, underscores are not allowed in hostnames (strict mode, which is the current behavior). When |
@oncletom Since this PR also introduces some options, shall we open an issue about the API and the way to specify options and have a more high-level discussion there? |
2c29345
to
d4ef41b
Compare
d4ef41b
to
4d99d8e
Compare
I like the idea of lenient mode, but I think the implementation should include an option for only allowing the lenient mode on subdomains of the TLD/public suffix. Another option could be 3 modes, strict, lenient, and sloppy where sloppy would allow even base domains to contain non-standard characters, lenient would test that the base domain is RFC compliant and skip validating only the subdomain portion, and strict would test the whole FQDN for RFC compliance. For general purposes, validating the base domain would still be of some importance. Registrars generally follow the RFCs on valid characters, though I am not sure all public suffixes would validate the subdomains users are allowed to create. I believe the default should be what I describe as "lenient" mode, with "strict" and "sloppy" being options. |
I do support the idea of the fix, of course, as the problem was pretty annoying when it hit me. It feel weird to me, though, that I suddenly have to create an instance of Either way, happy to see the discussion here. |
Thanks a lot for your feedback! @ZLightning I understand that having more flexibility would be needed in some cases. On the other hand, we need to think about what goals tld.js has. We had a similar discussion regarding URL parsing, which is not a trivial task. In the end we decided to keep the good-enough implementation from Node.js but allow users of tld.js to plug their own hostname extraction functions to customize to their needs. Regarding domain validation, I wonder if a similar approach would make sense? The current function is simple but maybe good enough for most needs. Also, it is relatively straightforward to use, fast to run, and should work in most cases. When this is not enough, we could consider plugging another validation function to be used in tld.js instead (which could come from a validation library for example). In fact, to go a step further, it would make sense to allow overriding all functions from the public API which do not have to do with public suffix/tld extraction, but are provided for convenience ( @ikari-pl I agree with you, creating a new instance of tld.js to allow any form of customization is cumbersome. On the other hand, we need to be careful, as options applied to one function might impact the behavior of other functions of tld.js, in which case applying options globally by instantiating a new tld.js makes it clear that it might impact different parts of the library. In the end this is a trade-off, and maybe we need to allow both: give an extra argument to What about this:
Because these changes could introduce some breaking changes (not all of them of course), maybe this could be done in two times?
What do you think? |
I'm all in favour of going step by step and to do so, to start by function arguments (rather than via a factory argument) 👍 I don't know how many people assume it's strict or lenient at the moment — from the comments I'd say |
I think it is essential to distinguish between domain part and the host part. There are possibly no underscore domains available even though RFC would allow but there are tons of real world _ hostnames out there. Especially from big providers. I personally think you should allow _ in host part without any setting by default. I think we all want to use this wonderful repo in real world. I am parsing billions of logs daily with it and had to make some workarounds to allow _ in host. |
@taskinosman Thanks for your comment. As mentioned in a previous comment, if you need this change quickly you could give tldts a shot. By default it will allow |
This PR should address the long-standing #73 issue. It is now possible to enable a more permissive hostname validation using the
lenientHostnameValidation
option: