Skip to content
This repository has been archived by the owner on Feb 3, 2024. It is now read-only.

Commit

Permalink
Merge pull request #259 from maarten-boot/development
Browse files Browse the repository at this point in the history
add central compiled regex pool; add tm tld; fix detection of 2 level tld's; fix tld com is sometimes different: example lg.com
  • Loading branch information
maarten-boot authored Jan 30, 2023
2 parents d42a7f5 + ee42fc0 commit 0f95223
Show file tree
Hide file tree
Showing 27 changed files with 184 additions and 30 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,9 @@ Raise an issue https://github.com/DannyCork/python-whois/issues/new

2023-01-27: maarten_boot
* add autodetect via iana tld file (this has only tld's)
* add a central collection of all compiled regexes and reuse them: REG_COLLECTION_BY_KEY in _0_init_tld.py
* refresh testdata now that tld has dot instead of _ if more then one level
* add additional strings meaning domain does not exist

## Support
* Python 3.x is supported.
Expand Down
2 changes: 1 addition & 1 deletion testdata/example.com/nameservers
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
example.com name server a.iana-servers.net.
example.com name server b.iana-servers.net.
example.com name server a.iana-servers.net.
2 changes: 1 addition & 1 deletion testdata/example.org/input
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Name Server: a.iana-servers.net
Name Server: b.iana-servers.net
DNSSEC: signedDelegation
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of WHOIS database: 2023-01-04T15:03:07Z <<<
>>> Last update of WHOIS database: 2023-01-27T13:06:45Z <<<

For more information on Whois status codes, please visit https://icann.org/epp

Expand Down
2 changes: 1 addition & 1 deletion testdata/example.org/nameservers
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
example.org name server b.iana-servers.net.
example.org name server a.iana-servers.net.
example.org name server b.iana-servers.net.
6 changes: 3 additions & 3 deletions testdata/google.com/input
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ Tech State/Province: CA
Tech Country: US
Tech Email: Select Request Email Form at https://domains.markmonitor.com/whois/google.com
Name Server: ns1.google.com
Name Server: ns2.google.com
Name Server: ns3.google.com
Name Server: ns4.google.com
Name Server: ns3.google.com
Name Server: ns2.google.com
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2023-01-04T14:55:04+0000 <<<
>>> Last update of WHOIS database: 2023-01-27T13:04:53+0000 <<<

For more information on WHOIS status codes, please visit:
https://www.icann.org/resources/pages/epp-status-codes
Expand Down
44 changes: 44 additions & 0 deletions testdata/hello.xyz/input
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
[Querying whois.nic.xyz]
[whois.nic.xyz]
Domain Name: HELLO.XYZ
Registry Domain ID: D2208533-CNIC
Registrar WHOIS Server: whois.namecheap.com
Registrar URL: https://namecheap.com
Updated Date: 2022-03-14T11:17:22.0Z
Creation Date: 2014-03-20T15:01:22.0Z
Registry Expiry Date: 2023-03-20T23:59:59.0Z
Registrar: Namecheap
Registrar IANA ID: 1068
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Registrant Organization: Privacy service provided by Withheld for Privacy ehf
Registrant State/Province: Capital Region
Registrant Country: IS
Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.
Admin Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.
Tech Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.
Name Server: DNS1.REGISTRAR-SERVERS.COM
Name Server: DNS2.REGISTRAR-SERVERS.COM
DNSSEC: unsigned
Billing Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.
Registrar Abuse Contact Email: abuse@namecheap.com
Registrar Abuse Contact Phone: +1.9854014545
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of WHOIS database: 2023-01-27T13:06:56.0Z <<<

For more information on Whois status codes, please visit https://icann.org/epp

>>> IMPORTANT INFORMATION ABOUT THE DEPLOYMENT OF RDAP: please visit
https://www.centralnic.com/support/rdap <<<

The Whois and RDAP services are provided by CentralNic, and contain
information pertaining to Internet domain names registered by our
our customers. By using this service you are agreeing (1) not to use any
information presented here for any purpose other than determining
ownership of domain names, (2) not to store or reproduce this data in
any way, (3) not to use any high-volume, automated, electronic processes
to obtain data from this service. Abuse of this service is monitored and
actions in contravention of these terms will result in being permanently
blacklisted. All data is (c) CentralNic Ltd (https://www.centralnic.com)

Access to the Whois and RDAP services is rate limited. For more
information, visit https://registrar-console.centralnic.com/pub/whois_guidance.
2 changes: 2 additions & 0 deletions testdata/hello.xyz/nameservers
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
hello.xyz name server dns1.registrar-servers.com.
hello.xyz name server dns2.registrar-servers.com.
15 changes: 15 additions & 0 deletions testdata/hello.xyz/output
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

test domain: <<<<<<<<<< hello.xyz >>>>>>>>>>>>>>>>>>>>
name str 'hello.xyz'
tld str 'xyz'
registrar str 'Namecheap'
registrant_country str 'IS'
creation_date datetime.datetime 2014-03-20 15:01:22
expiration_date datetime.datetime 2023-03-20 23:59:59
last_updated datetime.datetime 2022-03-14 11:17:22
status str 'clientTransferProhibited https://icann.org/epp#clientTransferProhibited'
statuses list ['clientTransferProhibited https://icann.org/epp#clientTransferProhibited']
dnssec bool False
name_servers list ['dns1.registrar-servers.com', 'dns2.registrar-servers.com']
registrant str 'Privacy service provided by Withheld for Privacy ehf'
emails list ['abuse@namecheap.com']
1 change: 1 addition & 0 deletions testdata/make_testdata.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ getDomains()
meta.co.jp # jp has [registrar] type keywords not registrar:
meta.kr # has both korean and english text
meta.com.tr # has utf 8 response text and different formatting style
hello.xyz # has sometimes IANA Source beginning on mac
)
}

Expand Down
2 changes: 1 addition & 1 deletion testdata/meta.co.jp/nameservers
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
meta.co.jp name server ns2.meta.co.jp.
meta.co.jp name server ns2.sphere.ad.jp.
meta.co.jp name server ns.meta.co.jp.
meta.co.jp name server ns2.sphere.ad.jp.
2 changes: 1 addition & 1 deletion testdata/meta.co.jp/output
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

test domain: <<<<<<<<<< meta.co.jp >>>>>>>>>>>>>>>>>>>>
name str 'meta.co.jp'
tld str 'co_jp'
tld str 'co.jp'
registrar str 'JPRS'
registrant_country str ''
creation_date datetime.datetime 1994-04-01 00:00:00
Expand Down
2 changes: 1 addition & 1 deletion testdata/meta.co.uk/input
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
c.ns.facebook.com
d.ns.facebook.com

WHOIS lookup made at 15:03:07 04-Jan-2023
WHOIS lookup made at 13:06:46 27-Jan-2023

--
This WHOIS information is provided for free by Nominet UK the central registry
Expand Down
2 changes: 1 addition & 1 deletion testdata/meta.co.uk/nameservers
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
meta.co.uk name server c.ns.facebook.com.
meta.co.uk name server d.ns.facebook.com.
meta.co.uk name server a.ns.facebook.com.
meta.co.uk name server c.ns.facebook.com.
meta.co.uk name server b.ns.facebook.com.
6 changes: 3 additions & 3 deletions testdata/meta.co.uk/output
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

test domain: <<<<<<<<<< meta.co.uk >>>>>>>>>>>>>>>>>>>>
name str 'meta.co.uk'
tld str 'co_uk'
tld str 'co.uk'
registrar str 'Hogan Lovells International LLP [Tag = LOVELLSLLP]'
registrant_country str ''
creation_date datetime.datetime 2001-11-01 00:00:00
Expand All @@ -10,7 +10,7 @@ last_updated datetime.datetime 2022-07-28 00:00:00
status str 'Registered until expiry date.'
statuses list ['Registered until expiry date.']
dnssec bool False
name_servers list ['a.ns.facebook.com', 'b.ns.facebook.com']
name_servers list ['a.ns.facebook.com', 'b.ns.facebook.com', 'c.ns.facebook.com', 'd.ns.facebook.com']
owner str ''
registrant str ''
registrant str 'Meta Platforms, Inc.'
emails list ['']
2 changes: 1 addition & 1 deletion testdata/meta.com.sg/output
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

test domain: <<<<<<<<<< meta.com.sg >>>>>>>>>>>>>>>>>>>>
name str 'meta.com.sg'
tld str 'com_sg'
tld str 'com.sg'
registrar str 'SINGNET PTE LTD'
registrant_country str ''
creation_date datetime.datetime 1998-12-03 17:04:50
Expand Down
2 changes: 1 addition & 1 deletion testdata/meta.com.tr/input
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ Expires on..............: 2026-Dec-27.


** Whois Server:
Last Update Time: 2023-01-04T18:01:59+03:00
Last Update Time: 2023-01-27T16:04:30+03:00
2 changes: 1 addition & 1 deletion testdata/meta.com.tr/output
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

test domain: <<<<<<<<<< meta.com.tr >>>>>>>>>>>>>>>>>>>>
name str 'meta.com.tr'
tld str 'com_tr'
tld str 'com.tr'
registrar str 'ODTÜ GELİŞTİRME VAKFI BİLGİ TEKNOLOJİLERİ SAN. VE TİC. A.Ş.'
registrant_country str ''
creation_date datetime.datetime 2006-12-28 00:00:00
Expand Down
6 changes: 3 additions & 3 deletions testdata/meta.com/input
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Domain Name: META.COM
Registry Domain ID: 1433704_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.registrarsafe.com
Registrar URL: https://www.registrarsafe.com
Updated Date: 2022-07-27T19:07:55Z
Updated Date: 2023-01-25T20:09:06Z
Creation Date: 1991-01-21T05:00:00Z
Registrar Registration Expiration Date: 2031-01-22T05:00:00Z
Registrar Registration Expiration Date: 2032-01-22T05:00:00Z
Registrar: RegistrarSafe, LLC
Registrar IANA ID: 3237
Registrar Abuse Contact Email: abusecomplaints@registrarsafe.com
Expand Down Expand Up @@ -64,7 +64,7 @@ Name Server: A.NS.FACEBOOK.COM
Name Server: D.NS.FACEBOOK.COM
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2023-01-04T15:03:11Z <<<
>>> Last update of WHOIS database: 2023-01-27T13:06:50Z <<<

Search results obtained from the RegistrarSafe, LLC WHOIS database are
provided by RegistrarSafe, LLC for information purposes only, to assist
Expand Down
4 changes: 2 additions & 2 deletions testdata/meta.com/nameservers
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
meta.com name server a.ns.facebook.com.
meta.com name server b.ns.facebook.com.
meta.com name server d.ns.facebook.com.
meta.com name server c.ns.facebook.com.
meta.com name server b.ns.facebook.com.
meta.com name server a.ns.facebook.com.
2 changes: 1 addition & 1 deletion testdata/meta.com/output
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ registrar str 'RegistrarSafe, LLC'
registrant_country str 'US'
creation_date datetime.datetime 1991-01-21 05:00:00
expiration_date NoneType None
last_updated datetime.datetime 2022-07-27 19:07:55
last_updated datetime.datetime 2023-01-25 20:09:06
status str 'clientDeleteProhibited https://www.icann.org/epp#clientDeleteProhibited'
statuses list ['clientDeleteProhibited https://www.icann.org/epp#clientDeleteProhibited', 'clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited', 'clientUpdateProhibited https://www.icann.org/epp#clientUpdateProhibited', 'serverDeleteProhibited https://www.icann.org/epp#serverDeleteProhibited', 'serverTransferProhibited https://www.icann.org/epp#serverTransferProhibited', 'serverUpdateProhibited https://www.icann.org/epp#serverUpdateProhibited']
dnssec bool False
Expand Down
6 changes: 3 additions & 3 deletions testdata/meta.kr/nameservers
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
meta.kr name server ns4.lovellsnames.org.
meta.kr name server ns3.lovellsnames.org.
meta.kr name server ns2.lovellsnames.org.
meta.kr name server ns1.lovellsnames.org.
meta.kr name server ns2.lovellsnames.org.
meta.kr name server ns3.lovellsnames.org.
meta.kr name server ns4.lovellsnames.org.
2 changes: 1 addition & 1 deletion testdata/meta.kr/output
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ last_updated datetime.datetime 2022-04-05 00:00:00
status str ''
statuses list ['']
dnssec bool False
name_servers list []
name_servers list ['ns1.lovellsnames.org', 'ns2.lovellsnames.org', 'ns3.lovellsnames.org', 'ns4.lovellsnames.org']
registrant str 'Gabia C&S'
emails list ['domreg@101domain.com']
42 changes: 41 additions & 1 deletion whois/_0_init_tld.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

Verbose = False
TLD_RE: Dict[str, Any] = {}
REG_COLLECTION_BY_KEY: Dict = {}


def validTlds():
Expand Down Expand Up @@ -70,7 +71,10 @@ def get_tld_re(tld: str, override: bool = False) -> Any:

# we want now to exclude _server hints
tld_re = dict(
(k, re.compile(v, re.IGNORECASE) if (isinstance(v, str) and k[0] != "_") else v) for k, v in tmp.items()
# (k, re.compile(v, re.IGNORECASE) if (isinstance(v, str) and k[0] != "_") else v) for k, v in tmp.items()
# dont recompile each re by themselves, reuse existing compiled re
(k, REG_COLLECTION_BY_KEY[k][v] if (isinstance(v, str) and k[0] != "_") else v)
for k, v in tmp.items()
)

# meta domains start with _: examples _centralnic and _donuts
Expand Down Expand Up @@ -110,10 +114,46 @@ def initOne(tld, override: bool = False):
print(f"{tld} -> {tld2}", file=sys.stderr)


def buildRegCollection(zz: Dict):
regCollection = {}
# get all regexes
for name in zz:
# print(name)
z = zz[name]
for key in z:
if key is None:
continue

if key.startswith("_"):
continue

if key in ["extend"]:
continue

if key not in regCollection:
regCollection[key] = {}

reg = z[key]
if reg is None:
continue

if reg in regCollection[key] and regCollection[key][reg] is not None:
# we already have a compiled regex, no need to do it again
continue

regCollection[key][reg] = None
if isinstance(reg, str):
regCollection[key][reg] = re.compile(reg, flags=re.IGNORECASE)

return regCollection


def initOnImport():
global REG_COLLECTION_BY_KEY
# here we run the import processing
# we load all tld's on import so we dont lose time later
# we keep ZZ so we can later reuse it if we want to aoverrid or update tld's
REG_COLLECTION_BY_KEY = buildRegCollection(ZZ)
override = False
for tld in ZZ.keys():
initOne(tld, override)
Expand Down
10 changes: 10 additions & 0 deletions whois/_1_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,16 @@ def do_query(
k = ".".join(dl)

if cache_file:
if verbose:
print(f"using cache file: {cache_file}", file=sys.stderr)
cache_load(cache_file)

# actually also whois uses cache, so if you really dont want to use cache
# you should also pass the --force-lookup flag (on linux)
if force or k not in CACHE or CACHE[k][0] < time.time() - cache_age:
if verbose:
print(f"force = {force}", file=sys.stderr)

# slow down before so we can force individual domains at a slower tempo
if slow_down:
time.sleep(slow_down)
Expand Down Expand Up @@ -159,6 +164,8 @@ def _do_whois_query(
return testWhoisPythonFromStaticTestData(dl, ignore_returncode, server, verbose)

cmd = makeWhoisCommandToRun(dl, server, verbose)
if verbose:
print(cmd, file=sys.stderr)

# LANG=en is added to make the ".jp" output consist across all environments
p = subprocess.Popen(
Expand All @@ -169,6 +176,9 @@ def _do_whois_query(
)

r = p.communicate()[0].decode(errors="ignore")
if verbose:
print(r, file=sys.stderr)

if ignore_returncode is False and p.returncode not in [0, 1]:
raise WhoisCommandFailed(r)

Expand Down
3 changes: 3 additions & 0 deletions whois/_2_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ def handleShortResponse(
# NOTE: from here s is lowercase only
# ---------------------------------
noneStrings = [
"the domain has not been registered",
"no match found for",
"no matching record",
"not found",
"no data found",
Expand All @@ -98,6 +100,7 @@ def handleShortResponse(
"no whois server is known for this kind of object",
"nameserver not found",
"malformed request", # this means this domain is not in whois as it is on top of a registered domain
"no match",
]

for i in noneStrings:
Expand Down
Loading

0 comments on commit 0f95223

Please sign in to comment.