Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch List implementation to use Trie-based lookup #134

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Commits on Feb 10, 2017

  1. Experimenting with an initial version of a Trie

    ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby
    test/profilers/list_profsize.rb
    8061 rules:
     1,631,751   PublicSuffix::List size
       421,868   Size of @rules
     1,313,514   Size of @trie
    weppos committed Feb 10, 2017
    Configuration menu
    Copy the full SHA
    d01b29d View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2017

  1. Testing 4 different implementations of Tries

    1. Hash-based trie where children are referenced using a Hash
       and storing one node per word char
    2. Hash-based trie (as 1) where each node contains a word char
       and values are stores as Symbol
    3. Hash-based trie (as 1) where each node contains a word part
       and values are stored as String
    4. Array-based trie where children are referenced using an Array
       and creating a mapping for each letter of the alphabet
    
    Some caveats:
    
    - 4) doesn't play nice with an alphabet which contains
      non ASCII chars as the mapping would be hard to achieve
    - 2) doesn't play nice with an alphabet which contains
      non ASCII chars as there's a risk of potential memory issues
      with version of Ruby where Symbols are not garbage collected
    - The current list is Unicode (and not Punycode for now) hence
      both 2) and 4) in practice are not usable
    - 3) implicitly saves space as there is no need to save the "."
      that, for what silly as it seems, the current list has 8750
      dots (and 8061 rules)
    - memory cost is cost of the Trie structure AND cost of the
      string allocated to store the words (including ".").
    
    ---
    
    Memory comparison:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_prosize.rb
           943,325   @trie_hash
           598,730   @trie_symbol
           312,361   @trie_parts
         1,627,182   @trie_array
    
    HashTrie:
    
        Total allocated: 23745976 bytes (333807 objects)
        Total retained:  16647216 bytes (172460 objects)
    
        allocated memory by gem
        -----------------------------------
          23745936  publicsuffix-ruby/lib
                40  other
    
        allocated memory by file
        -----------------------------------
          23745936  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                40  test/profilers/tries_profiler.rb
    
        allocated memory by location
        -----------------------------------
          12042560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           6892920  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
           3516640  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44
           1293696  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
               120  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:16
    
        allocated memory by class
        -----------------------------------
          12042560  Hash
           6140656  String
           2297720  Array
           2297680  PublicSuffix::TrieHash::Node
            967320  Enumerator
                40  PublicSuffix::TrieHash
    
        allocated objects by gem
        -----------------------------------
            333806  publicsuffix-ruby/lib
                 1  other
    
        allocated objects by file
        -----------------------------------
            333806  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                 1  test/profilers/tries_profiler.rb
    
        allocated objects by location
        -----------------------------------
            172323  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
             87916  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
             16122  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
                 3  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  test/profilers/tries_profiler.rb:16
    
        allocated objects by class
        -----------------------------------
            153418  String
             57443  Array
             57442  Hash
             57442  PublicSuffix::TrieHash::Node
              8061  Enumerator
                 1  PublicSuffix::TrieHash
    
        retained memory by gem
        -----------------------------------
          16647176  publicsuffix-ruby/lib
                40  other
    
        retained memory by file
        -----------------------------------
          16647176  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                40  test/profilers/tries_profiler.rb
    
        retained memory by location
        -----------------------------------
          12042560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           4595280  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
              9296  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
                40  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:16
    
        retained memory by class
        -----------------------------------
          12042560  Hash
           2306936  String
           2297680  PublicSuffix::TrieHash::Node
                40  PublicSuffix::TrieHash
    
        retained objects by gem
        -----------------------------------
            172459  publicsuffix-ruby/lib
                 1  other
    
        retained objects by file
        -----------------------------------
            172459  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                 1  test/profilers/tries_profiler.rb
    
        retained objects by location
        -----------------------------------
            114882  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
               134  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  test/profilers/tries_profiler.rb:16
    
        retained objects by class
        -----------------------------------
             57575  String
             57442  Hash
             57442  PublicSuffix::TrieHash::Node
                 1  PublicSuffix::TrieHash
    
        Retained String Report
        -----------------------------------
              6728  "."
              5987  "a"
              4263  "o"
              3636  "i"
              3027  "e"
              3012  "n"
              2918  "u"
              2868  "m"
                    ...
    
    HashTrieSymbol:
    
        Total allocated: 21449376 bytes (276392 objects)
        Total retained:  14350616 bytes (115045 objects)
    
        allocated memory by gem
        -----------------------------------
          21449336  publicsuffix-ruby/lib
                40  other
    
        allocated memory by file
        -----------------------------------
          21448296  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
              1040  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb
                40  test/profilers/tries_profiler.rb
    
        allocated memory by location
        -----------------------------------
          12042560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           4595280  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
           3516640  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44
           1293696  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
              1040  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9
               120  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:18
    
        allocated memory by class
        -----------------------------------
          12042560  Hash
           3843536  String
           2297720  Array
           2297680  PublicSuffix::TrieHashSymbol::Node
            967320  Enumerator
               520  Symbol
                40  PublicSuffix::TrieHashSymbol
    
        allocated objects by gem
        -----------------------------------
            276391  publicsuffix-ruby/lib
                 1  other
    
        allocated objects by file
        -----------------------------------
            276365  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                26  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb
                 1  test/profilers/tries_profiler.rb
    
        allocated objects by location
        -----------------------------------
            114882  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
             87916  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
             16122  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
                26  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9
                 3  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  test/profilers/tries_profiler.rb:18
    
        allocated objects by class
        -----------------------------------
             95990  String
             57443  Array
             57442  Hash
             57442  PublicSuffix::TrieHashSymbol::Node
              8061  Enumerator
                13  Symbol
                 1  PublicSuffix::TrieHashSymbol
    
        retained memory by gem
        -----------------------------------
          14350576  publicsuffix-ruby/lib
                40  other
    
        retained memory by file
        -----------------------------------
          14349536  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
              1040  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb
                40  test/profilers/tries_profiler.rb
    
        retained memory by location
        -----------------------------------
          12042560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           2297640  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
              9296  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
              1040  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9
                40  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:18
    
        retained memory by class
        -----------------------------------
          12042560  Hash
           2297680  PublicSuffix::TrieHashSymbol::Node
              9816  String
               520  Symbol
                40  PublicSuffix::TrieHashSymbol
    
        retained objects by gem
        -----------------------------------
            115044  publicsuffix-ruby/lib
                 1  other
    
        retained objects by file
        -----------------------------------
            115018  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                26  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb
                 1  test/profilers/tries_profiler.rb
    
        retained objects by location
        -----------------------------------
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
             57441  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
               134  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83
                26  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  test/profilers/tries_profiler.rb:18
    
        retained objects by class
        -----------------------------------
             57442  Hash
             57442  PublicSuffix::TrieHashSymbol::Node
               147  String
                13  Symbol
                 1  PublicSuffix::TrieHashSymbol
    
        Retained String Report
        -----------------------------------
                 1  "*.compute-1.amazonaws.com"
                 1  "*.compute.amazonaws.com.cn"
                 1  "*.githubcloudusercontent.com"
                 1  "0"
                 1  "1"
                    ...
    
    HashTrieParts:
    
        Total allocated: 6263412 bytes (98963 objects)
        Total retained:  3392172 bytes (43476 objects)
    
        allocated memory by gem
        -----------------------------------
           6263372  publicsuffix-ruby/lib
                40  other
    
        allocated memory by file
        -----------------------------------
           3971787  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
           2291585  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb
                40  test/profilers/tries_profiler.rb
    
        allocated memory by location
        -----------------------------------
           2291585  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29
           2232560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           1739107  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
               120  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:20
    
        allocated memory by class
        -----------------------------------
           2232560  Hash
           1574772  String
            967320  Enumerator
            909040  Array
            579680  PublicSuffix::TrieHashParts::Node
                40  PublicSuffix::TrieHashParts
    
        allocated objects by gem
        -----------------------------------
             98962  publicsuffix-ruby/lib
                 1  other
    
        allocated objects by file
        -----------------------------------
             57967  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
             40995  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb
                 1  test/profilers/tries_profiler.rb
    
        allocated objects by location
        -----------------------------------
             43472  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
             40995  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29
             14492  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
                 3  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  test/profilers/tries_profiler.rb:20
    
        allocated objects by class
        -----------------------------------
             39363  String
             22554  Array
             14492  Hash
             14492  PublicSuffix::TrieHashParts::Node
              8061  Enumerator
                 1  PublicSuffix::TrieHashParts
    
        retained memory by gem
        -----------------------------------
           3392132  publicsuffix-ruby/lib
                40  other
    
        retained memory by file
        -----------------------------------
           3392067  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                65  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb
                40  test/profilers/tries_profiler.rb
    
        retained memory by location
        -----------------------------------
           2232560  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
           1159467  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
                65  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29
                40  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                40  test/profilers/tries_profiler.rb:20
    
        retained memory by class
        -----------------------------------
           2232560  Hash
            579892  String
            579680  PublicSuffix::TrieHashParts::Node
                40  PublicSuffix::TrieHashParts
    
        retained objects by gem
        -----------------------------------
             43475  publicsuffix-ruby/lib
                 1  other
    
        retained objects by file
        -----------------------------------
             43474  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb
                 1  test/profilers/tries_profiler.rb
    
        retained objects by location
        -----------------------------------
             28981  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16
             14492  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29
                 1  test/profilers/tries_profiler.rb:20
    
        retained objects by class
        -----------------------------------
             14492  Hash
             14492  PublicSuffix::TrieHashParts::Node
             14491  String
                 1  PublicSuffix::TrieHashParts
    
        Retained String Report
        -----------------------------------
              1792  "jp"
               756  "no"
               549  "museum"
               370  "it"
               332  "com"
                    ...
    
    HashTrieArray:
    
        Total allocated: 27171176 bytes (276366 objects)
        Total retained:  20072416 bytes (115019 objects)
    
        allocated memory by gem
        -----------------------------------
          27171136  publicsuffix-ruby/lib
                40  other
    
        allocated memory by file
        -----------------------------------
          27171136  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb
                40  test/profilers/tries_profiler.rb
    
        allocated memory by location
        -----------------------------------
          17765400  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14
           4595280  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22
           3516640  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:50
           1293696  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89
               120  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45
                40  test/profilers/tries_profiler.rb:22
    
        allocated memory by class
        -----------------------------------
          20063120  Array
           3843016  String
           2297680  PublicSuffix::TrieArray::Node
            967320  Enumerator
                40  PublicSuffix::TrieArray
    
        allocated objects by gem
        -----------------------------------
            276365  publicsuffix-ruby/lib
                 1  other
    
        allocated objects by file
        -----------------------------------
            276365  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb
                 1  test/profilers/tries_profiler.rb
    
        allocated objects by location
        -----------------------------------
            114882  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22
             87916  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:50
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14
             16122  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89
                 3  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45
                 1  test/profilers/tries_profiler.rb:22
    
        allocated objects by class
        -----------------------------------
            114885  Array
             95977  String
             57442  PublicSuffix::TrieArray::Node
              8061  Enumerator
                 1  PublicSuffix::TrieArray
    
        retained memory by gem
        -----------------------------------
          20072376  publicsuffix-ruby/lib
                40  other
    
        retained memory by file
        -----------------------------------
          20072376  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb
                40  test/profilers/tries_profiler.rb
    
        retained memory by location
        -----------------------------------
          17765400  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14
           2297640  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22
              9296  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89
                40  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45
                40  test/profilers/tries_profiler.rb:22
    
        retained memory by class
        -----------------------------------
          17765400  Array
           2297680  PublicSuffix::TrieArray::Node
              9296  String
                40  PublicSuffix::TrieArray
    
        retained objects by gem
        -----------------------------------
            115018  publicsuffix-ruby/lib
                 1  other
    
        retained objects by file
        -----------------------------------
            115018  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb
                 1  test/profilers/tries_profiler.rb
    
        retained objects by location
        -----------------------------------
             57442  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14
             57441  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22
               134  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89
                 1  /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45
                 1  test/profilers/tries_profiler.rb:22
    
        retained objects by class
        -----------------------------------
             57442  Array
             57442  PublicSuffix::TrieArray::Node
               134  String
                 1  PublicSuffix::TrieArray
    
        Retained String Report
        -----------------------------------
                 1  "*.compute-1.amazonaws.com"
                 1  "*.compute.amazonaws.com.cn"
                 1  "*.githubcloudusercontent.com"
                 1  "accident-investigation.aero"
                 1  "accident-prevention.aero"
                 1  "air-traffic-control.aero"
                    ...
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    01d178f View commit details
    Browse the repository at this point in the history
  2. Leverage Trie compression

    In the first iteration I completely missed the point that
    given the domain name system is hierarchical, to increase compression
    it is a good idea to store the reversed string or parts.
    
    In this way strings sharing common suffixes such as:
    
    - io
    - github.io
    - gitlab.io
    
    will better leverage Trie compression as the space for io will be
    shared with the path for the other two suffixes.
    
    As a result of this change, decreased drastically:
    
    Before:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_prosize.rb
            943,325   @trie_hash
            598,730   @trie_symbol
            312,361   @trie_parts
            1,627,182   @trie_array
    
    After:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/tries_prosize.rb
           624,813   @trie_hash
           399,660   @trie_symbol
           197,291   @trie_parts
           982,347   @trie_array
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash
    
        Total allocated: 17,067,504 bytes (262,240 objects)
        Total retained:  10,433,288 bytes (112,605 objects)
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash-symbol
    
        Total allocated: 15,567,184 bytes (224,732 objects)
        Total retained:  8,932,968 bytes (75,097 objects)
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash-parts
    
        Total allocated: 7,388,993 bytes (130,792 objects)
        Total retained:  1,438,762 bytes (24,513 objects)
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb array
    
        Total allocated: 18,700,776 bytes (224,706 objects)
        Total retained:  12,066,560 bytes (75,071 objects)
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    faeff13 View commit details
    Browse the repository at this point in the history
  3. Promote the part-based hash Trie to the primary Trie implementation

        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/tries_prosize.rb
           263,451   @rules
           194,536   @trie
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    f4e34e9 View commit details
    Browse the repository at this point in the history
  4. Integrate the Trie into the List

    Change the Trie to store an associative key/pair, instead of a single set of words. The key is the rule, the value is the metadata of the rule.
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb
        8061 rules:
           482,019   PublicSuffix::List size
           263,451   Size of @rules
           307,630   Size of @trie
    
    It looks like the Hash is still a little bit smaller than the Trie.
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    5868396 View commit details
    Browse the repository at this point in the history
  5. Optimize the Trie removing redundant objects

    Merge the entry into the trie node. That will also allow to save the attribute
    "length" of the entry, which is not required by the trie as I can already
    determine the length by the level in the tree.
    
    Before:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb
        8061 rules:
           482,019   PublicSuffix::List size
           263,451   Size of @rules
           307,630   Size of @trie
    
    After:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb
        8061 rules:
           490,391   PublicSuffix::List size
           263,451   Size of @rules
           226,985   Size of @trie
    
    The trie is now beating the Hash by ~40kb.
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    2f20390 View commit details
    Browse the repository at this point in the history
  6. Implement full Trie-based version of the PSL

    This commit handle wildcard and exceptions, and passes all the tests.
    
    ---
    
    Benchmark Hash vs Trie:
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ WHAT=hash ruby test/benchmarks/bm_find.rb
        Rehearsal -------------------------------------------------------------
        NAME_SHORT                  0.530000   0.010000   0.540000 (  0.540001)
        NAME_MEDIUM                 0.600000   0.000000   0.600000 (  0.608115)
        NAME_LONG                   0.780000   0.010000   0.790000 (  0.796897)
        NAME_WILD                   0.900000   0.020000   0.920000 (  0.961535)
        NAME_EXCP                   1.020000   0.020000   1.040000 (  1.094007)
        IAAA                        0.620000   0.010000   0.630000 (  0.649537)
        IZZZ                        0.590000   0.000000   0.590000 (  0.604190)
        PAAA                        1.030000   0.020000   1.050000 (  1.082507)
        PZZZ                        0.970000   0.020000   0.990000 (  1.009199)
        JP                          0.920000   0.010000   0.930000 (  0.939533)
        IT                          0.610000   0.010000   0.620000 (  0.618309)
        COM                         0.630000   0.000000   0.630000 (  0.642974)
        ---------------------------------------------------- total: 9.330000sec
    
                                        user     system      total        real
        NAME_SHORT                  0.580000   0.010000   0.590000 (  0.592958)
        NAME_MEDIUM                 0.680000   0.010000   0.690000 (  0.698372)
        NAME_LONG                   0.820000   0.010000   0.830000 (  0.830893)
        NAME_WILD                   0.810000   0.010000   0.820000 (  0.831984)
        NAME_EXCP                   0.960000   0.010000   0.970000 (  0.981469)
        IAAA                        0.600000   0.010000   0.610000 (  0.611947)
        IZZZ                        0.610000   0.000000   0.610000 (  0.626348)
        PAAA                        0.970000   0.020000   0.990000 (  0.982282)
        PZZZ                        0.990000   0.010000   1.000000 (  1.012680)
        JP                          0.940000   0.010000   0.950000 (  0.954031)
        IT                          0.610000   0.010000   0.620000 (  0.627587)
        COM                         0.620000   0.010000   0.630000 (  0.636131)
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ WHAT=trie ruby test/benchmarks/bm_find.rb
        Rehearsal -------------------------------------------------------------
        NAME_SHORT                  0.700000   0.010000   0.710000 (  0.722887)
        NAME_MEDIUM                 0.750000   0.010000   0.760000 (  0.767034)
        NAME_LONG                   0.790000   0.010000   0.800000 (  0.802235)
        NAME_WILD                   0.770000   0.010000   0.780000 (  0.786366)
        NAME_EXCP                   0.810000   0.010000   0.820000 (  0.832109)
        IAAA                        0.680000   0.000000   0.680000 (  0.690577)
        IZZZ                        0.690000   0.010000   0.700000 (  0.694839)
        PAAA                        0.810000   0.010000   0.820000 (  0.826133)
        PZZZ                        0.790000   0.010000   0.800000 (  0.803508)
        JP                          0.830000   0.000000   0.830000 (  0.855188)
        IT                          0.710000   0.010000   0.720000 (  0.714962)
        COM                         0.670000   0.010000   0.680000 (  0.687400)
        ---------------------------------------------------- total: 9.100000sec
    
                                        user     system      total        real
        NAME_SHORT                  0.690000   0.010000   0.700000 (  0.706099)
        NAME_MEDIUM                 0.730000   0.010000   0.740000 (  0.749351)
        NAME_LONG                   0.750000   0.010000   0.760000 (  0.765484)
        NAME_WILD                   0.770000   0.010000   0.780000 (  0.781182)
        NAME_EXCP                   0.800000   0.000000   0.800000 (  0.815244)
        IAAA                        0.670000   0.010000   0.680000 (  0.682966)
        IZZZ                        0.670000   0.010000   0.680000 (  0.682771)
        PAAA                        0.830000   0.010000   0.840000 (  0.847581)
        PZZZ                        0.810000   0.010000   0.820000 (  0.829023)
        JP                          0.810000   0.000000   0.810000 (  0.831782)
        IT                          0.680000   0.010000   0.690000 (  0.691071)
        COM                         0.660000   0.010000   0.670000 (  0.669978)
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    76e37ce View commit details
    Browse the repository at this point in the history
  7. Do not pre-allocate the children Hash

    Ruby allocates a reasonable amount of memory even for an empty Hash.
    Do not allocate the children Hash until needed, to avoid having
    nodes with no children using unnecessary extra memory.
    
    Pre-initialize children:
    
        226,985   Size of @trie
    
        ➜  publicsuffix-ruby git:(thesis-trie) ruby test/profilers/initialization_profiler.rb
        Total allocated: 8950176 bytes (117512 objects)
        Total retained:  2475538 bytes (40477 objects)
    
    Lazy-initialize children:
    
        219,329   Size of @trie
    
        ➜  publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/initialization_profiler.rb
        Total allocated: 8643936 bytes (109856 objects)
        Total retained:  2169298 bytes (32821 objects)
    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    a6193d3 View commit details
    Browse the repository at this point in the history
  8. Document the Trie

    weppos committed Feb 11, 2017
    Configuration menu
    Copy the full SHA
    45e3361 View commit details
    Browse the repository at this point in the history