-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_tld(), get_components(), and other #17
base: master
Are you sure you want to change the base?
Changes from all commits
dfa069c
db483ee
6ecba33
d158402
e95f43d
006e431
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
This file describes exact behavior of methods for different edge cases and | ||
explains general logic. This description covers the behavior of get_tld, | ||
get_tld_unsafe, get_sld, get_sld_unsafe, split_domain, split_domain_unsafe | ||
|
||
Unsafe versions of the methods will significantly save resources on large-scale | ||
applications of the library where the data has already been converted to | ||
lowercase and missing data has a None value. This can be done in Spark/Dask, | ||
for example, and result in a significant reduction in computational resources. | ||
For adhoc usage, the original functions are sufficient. | ||
|
||
1. general difference of get_*() and get_*_unsafe() methods: | ||
get_*_unsafe() does not perform if the input string is None and does not | ||
transforms it to the lower case. | ||
|
||
2. The listed above methods works only with non-canonical FQDN strings - | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. listed above means all or just the unsafe methods? |
||
trailing dot must be removed before call the method. This restriction allows | ||
get rid of fuzzy logic in edge cases. | ||
|
||
3. DNS does not support empty labels - if some label detected to be empty, | ||
None will be returned. | ||
|
||
4. Every method processes provided FQDN in the reverse order, from the last | ||
label towards the start of the string. It stops when the specific task is | ||
completed. Therefore no validation occurs outside of this scope. For example, | ||
``` | ||
get_tld('......com') -> 'com' | ||
``` | ||
as leading dots are not processed. | ||
split_domain method is based on get_sld method - it returns everything in | ||
front of get_sld() as a prefix. | ||
Specifically to example above | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vadym-t you might also add here a non-edge case example. the split_domain() method offers a new capability to the library-- one that folks might get from other libraries -- but your only example is the edge. suggest something like: |
||
``` | ||
split_domain('......com') -> ('....',None,'com') | ||
``` | ||
Edge cases and expected behavior | ||
The behavior of the library can be illustrated best on the small examples: | ||
(boolean arguments are omitted if does not affect behavior ) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadym-t suggest to add here a why sentence:
Unsafe versions of the methods will significantly save resources on large-scale applications of the library where the data has already been converted to lowercase and missing data has a None value. This can be done in Spark/Dask, for example, and result in a significant reduction in computational resources. For adhoc usage, the original functions are sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added