Optimize id matching #519

ypconstante · 2024-01-03T15:00:08Z

Today the id matching goes through all node attributes trying to find an exact id, even after checking an id attribute with a different value.

This PR changes the check find the first attribute id instead of keep checking the remaining attributes.

When using attributes as lists performance is basically the same, but for attributes as maps performance improved significantly.
For both cases the memory usage is reduced.

##### With input bench #####
Name                               ips        average  deviation         median         99th %
attributes_as_maps (pr)         158.48        6.31 ms    ±23.60%        6.19 ms        9.87 ms
attributes_as_lists (pr)        158.37        6.31 ms    ±25.44%        6.15 ms       10.31 ms
attributes_as_lists             157.35        6.36 ms    ±22.13%        6.25 ms        9.67 ms
attributes_as_maps              130.97        7.64 ms    ±21.17%        7.52 ms       11.90 ms

Comparison:
attributes_as_maps (pr)         158.48
attributes_as_lists (pr)        158.37 - 1.00x slower +0.00458 ms
attributes_as_lists             157.35 - 1.01x slower +0.0453 ms
attributes_as_maps              130.97 - 1.21x slower +1.33 ms

Memory usage statistics:

Name                        Memory usage
attributes_as_maps (pr)          2.89 MB
attributes_as_lists (pr)         2.89 MB - 1.00x memory usage -0.00183 MB
attributes_as_lists              3.07 MB - 1.06x memory usage +0.182 MB
attributes_as_maps               3.86 MB - 1.33x memory usage +0.97 MB

read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
end

html_input = read_file.("medium.html")

[{"html", _, _} = html_attributes_as_lists | _] = Floki.parse_document!(html_input)
[{"html", _, _} = html_attributes_as_maps | _] = Floki.parse_document!(html_input, attributes_as_maps: true)

Benchee.run(
  %{
    "attributes_as_lists" => fn selector -> Floki.Finder.find(html_attributes_as_lists, selector)  end,
    "attributes_as_maps" => fn selector -> Floki.Finder.find(html_attributes_as_maps, selector)  end
  },
  time: 10,
  memory_time: 2,
  inputs: %{
    "bench" => "#cite_ref-Pocock1939_1-1"
  }
)

philss

Added a suggestion to try, but LGTM!

lib/floki/selector.ex

philss

One last thing :)

lib/floki/selector.ex

Optimize id matching

f7491c6

philss reviewed Jan 3, 2024

View reviewed changes

lib/floki/selector.ex Outdated Show resolved Hide resolved

Use pattern matching to get id value

a2172ab

philss reviewed Jan 4, 2024

View reviewed changes

lib/floki/selector.ex Outdated Show resolved Hide resolved

Check equality in the pattern match

a754b67

philss merged commit 18a2cf8 into philss:main Jan 4, 2024
9 checks passed

ypconstante deleted the optimize-id-matching branch January 4, 2024 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize id matching #519

Optimize id matching #519

ypconstante commented Jan 3, 2024

philss left a comment

philss left a comment

Optimize id matching #519

Optimize id matching #519

Conversation

ypconstante commented Jan 3, 2024

philss left a comment

Choose a reason for hiding this comment

philss left a comment

Choose a reason for hiding this comment