Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise select() for long subdomains #268

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

elliotwutingfeng
Copy link

@elliotwutingfeng elliotwutingfeng commented Oct 29, 2023

Current implementation of select() searches for longest matching TLDs from the right end all the way to the left end.

This approach is necessary to handle edge cases like example.s3.cn-north-1.amazonaws.com.cn, where

  • s3.cn-north-1.amazonaws.com.cn and com.cn are valid.
  • but the intermediates cn-north-1.amazonaws.com.cn and amazonaws.com.cn are not valid.

However, this disadvantages URLs with long subdomains like a.very.long.subdomain.example.co.uk.

We can terminate the search early by limiting the search size to [parts.size, @max_rule_size].min, where parts.size is number of parts in the hostname, and @max_rule_size is the number of parts in the largest rule in @rules.

Also replaced the kernel loop with a faster bounded while loop, as it is possible to convert the current break condition to a loop condition.

Before

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.348576   0.000000   2.348576 (  2.350146)
NAME_SHORT (noprivate)      2.444302   0.000000   2.444302 (  2.445995)
NAME_MEDIUM                 2.890648   0.000000   2.890648 (  2.892380)
NAME_MEDIUM (noprivate)     3.014823   0.000000   3.014823 (  3.017137)
NAME_LONG                   3.705042   0.002693   3.707735 (  3.710142)
NAME_LONG (noprivate)       3.727960   0.000000   3.727960 (  3.730321)
NAME_WILD                   3.657520   0.000000   3.657520 (  3.659759)
NAME_WILD (noprivate)       3.815247   0.000000   3.815247 (  3.817492)
NAME_EXCP                   4.420996   0.000000   4.420996 (  4.423570)
NAME_EXCP (noprivate)       4.408350   0.000000   4.408350 (  4.411540)
IAAA                        2.604410   0.000000   2.604410 (  2.605894)
IAAA (noprivate)            2.688674   0.000000   2.688674 (  2.690398)
IZZZ                        2.605931   0.000000   2.605931 (  2.607543)
IZZZ (noprivate)            2.679484   0.000000   2.679484 (  2.681334)
PAAA                        4.506107   0.000000   4.506107 (  4.509242)
PAAA (noprivate)            4.174697   0.000000   4.174697 (  4.177737)
PZZZ                        4.618712   0.000000   4.618712 (  4.622306)
PZZZ (noprivate)            4.323496   0.000000   4.323496 (  4.327372)
JP                          4.151477   0.000000   4.151477 (  4.154904)
JP (noprivate)              4.230317   0.000000   4.230317 (  4.234143)
IT                          2.645423   0.000000   2.645423 (  2.647490)
IT (noprivate)              2.731147   0.000000   2.731147 (  2.733281)
COM                         2.672895   0.000000   2.672895 (  2.675236)
COM (noprivate)             2.796167   0.000000   2.796167 (  2.798951)
--------------------------------------------------- total: 81.865094sec

                                user     system      total        real
NAME_SHORT                  2.455661   0.000000   2.455661 (  2.458051)
NAME_SHORT (noprivate)      2.465275   0.000000   2.465275 (  2.468431)
NAME_MEDIUM                 2.946424   0.000000   2.946424 (  2.949358)
NAME_MEDIUM (noprivate)     3.023296   0.000000   3.023296 (  3.025300)
NAME_LONG                   3.770850   0.000000   3.770850 (  3.773397)
NAME_LONG (noprivate)       3.828416   0.000000   3.828416 (  3.830904)
NAME_WILD                   3.749617   0.000000   3.749617 (  3.752038)
NAME_WILD (noprivate)       3.827687   0.000000   3.827687 (  3.830190)
NAME_EXCP                   4.418445   0.000000   4.418445 (  4.421315)
NAME_EXCP (noprivate)       4.531002   0.000000   4.531002 (  4.535273)
IAAA                        2.699374   0.000000   2.699374 (  2.700931)
IAAA (noprivate)            2.768779   0.000000   2.768779 (  2.771347)
IZZZ                        2.699160   0.000000   2.699160 (  2.702339)
IZZZ (noprivate)            2.766278   0.000000   2.766278 (  2.769706)
PAAA                        4.706753   0.000000   4.706753 (  4.711835)
PAAA (noprivate)            4.363877   0.000000   4.363877 (  4.367030)
PZZZ                        4.716710   0.000000   4.716710 (  4.722447)
PZZZ (noprivate)            4.109007   0.000000   4.109007 (  4.111433)
JP                          3.937950   0.000000   3.937950 (  3.941688)
JP (noprivate)              4.065472   0.000000   4.065472 (  4.070663)
IT                          2.628695   0.000000   2.628695 (  2.630612)
IT (noprivate)              2.718972   0.000000   2.718972 (  2.721554)
COM                         2.647181   0.000000   2.647181 (  2.649369)
COM (noprivate)             2.714115   0.000000   2.714115 (  2.715725)

After

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.237599   0.000000   2.237599 (  2.239443)
NAME_SHORT (noprivate)      2.336548   0.000000   2.336548 (  2.338574)
NAME_MEDIUM                 2.713107   0.000000   2.713107 (  2.714795)
NAME_MEDIUM (noprivate)     2.830825   0.000000   2.830825 (  2.832685)
NAME_LONG                   3.042471   0.000000   3.042471 (  3.044456)
NAME_LONG (noprivate)       3.019529   0.003196   3.022725 (  3.024463)
NAME_WILD                   2.978485   0.000000   2.978485 (  2.980252)
NAME_WILD (noprivate)       3.088728   0.000000   3.088728 (  3.090743)
NAME_EXCP                   3.682105   0.000000   3.682105 (  3.684332)
NAME_EXCP (noprivate)       3.815742   0.000000   3.815742 (  3.818032)
IAAA                        2.458039   0.000000   2.458039 (  2.459425)
IAAA (noprivate)            2.496389   0.000000   2.496389 (  2.497893)
IZZZ                        2.404844   0.000000   2.404844 (  2.406255)
IZZZ (noprivate)            2.463744   0.000000   2.463744 (  2.465130)
PAAA                        3.515573   0.000000   3.515573 (  3.517585)
PAAA (noprivate)            3.193961   0.000000   3.193961 (  3.195845)
PZZZ                        3.587199   0.000000   3.587199 (  3.589388)
PZZZ (noprivate)            3.254129   0.000000   3.254129 (  3.256092)
JP                          3.783495   0.000000   3.783495 (  3.785693)
JP (noprivate)              3.885775   0.003331   3.889106 (  3.891664)
IT                          2.513112   0.000000   2.513112 (  2.514673)
IT (noprivate)              2.599210   0.000000   2.599210 (  2.600769)
COM                         2.539283   0.000000   2.539283 (  2.540692)
COM (noprivate)             2.485424   0.000000   2.485424 (  2.486922)
--------------------------------------------------- total: 70.931843sec

                                user     system      total        real
NAME_SHORT                  2.218905   0.000000   2.218905 (  2.220197)
NAME_SHORT (noprivate)      2.282971   0.000000   2.282971 (  2.284161)
NAME_MEDIUM                 2.707217   0.000000   2.707217 (  2.708815)
NAME_MEDIUM (noprivate)     2.781946   0.000000   2.781946 (  2.783615)
NAME_LONG                   3.018843   0.000000   3.018843 (  3.020559)
NAME_LONG (noprivate)       3.079345   0.000000   3.079345 (  3.081143)
NAME_WILD                   3.041727   0.000000   3.041727 (  3.043618)
NAME_WILD (noprivate)       3.079496   0.000000   3.079496 (  3.081228)
NAME_EXCP                   3.655873   0.000000   3.655873 (  3.658370)
NAME_EXCP (noprivate)       3.754648   0.000000   3.754648 (  3.756916)
IAAA                        2.507284   0.000000   2.507284 (  2.509283)
IAAA (noprivate)            2.540126   0.000000   2.540126 (  2.541872)
IZZZ                        2.466202   0.000000   2.466202 (  2.467584)
IZZZ (noprivate)            2.544616   0.000000   2.544616 (  2.546141)
PAAA                        3.622206   0.000000   3.622206 (  3.624447)
PAAA (noprivate)            3.272909   0.000000   3.272909 (  3.274831)
PZZZ                        3.675658   0.000000   3.675658 (  3.677843)
PZZZ (noprivate)            3.318359   0.000000   3.318359 (  3.320537)
JP                          3.882480   0.000000   3.882480 (  3.885434)
JP (noprivate)              3.971438   0.000000   3.971438 (  3.974437)
IT                          2.548282   0.000000   2.548282 (  2.549875)
IT (noprivate)              2.609304   0.000000   2.609304 (  2.610879)
COM                         2.569648   0.000000   2.569648 (  2.571186)
COM (noprivate)             2.497100   0.000000   2.497100 (  2.498543)

@elliotwutingfeng elliotwutingfeng marked this pull request as ready for review October 29, 2023 08:00
@weppos
Copy link
Owner

weppos commented Nov 21, 2023

Thanks for your contribution @elliotwutingfeng. I need some time to review the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants