Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize id matching #519

Merged
merged 3 commits into from
Jan 4, 2024
Merged

Conversation

ypconstante
Copy link
Contributor

Today the id matching goes through all node attributes trying to find an exact id, even after checking an id attribute with a different value.

This PR changes the check find the first attribute id instead of keep checking the remaining attributes.

When using attributes as lists performance is basically the same, but for attributes as maps performance improved significantly.
For both cases the memory usage is reduced.

##### With input bench #####
Name                               ips        average  deviation         median         99th %
attributes_as_maps (pr)         158.48        6.31 ms    ±23.60%        6.19 ms        9.87 ms
attributes_as_lists (pr)        158.37        6.31 ms    ±25.44%        6.15 ms       10.31 ms
attributes_as_lists             157.35        6.36 ms    ±22.13%        6.25 ms        9.67 ms
attributes_as_maps              130.97        7.64 ms    ±21.17%        7.52 ms       11.90 ms

Comparison:
attributes_as_maps (pr)         158.48
attributes_as_lists (pr)        158.37 - 1.00x slower +0.00458 ms
attributes_as_lists             157.35 - 1.01x slower +0.0453 ms
attributes_as_maps              130.97 - 1.21x slower +1.33 ms

Memory usage statistics:

Name                        Memory usage
attributes_as_maps (pr)          2.89 MB
attributes_as_lists (pr)         2.89 MB - 1.00x memory usage -0.00183 MB
attributes_as_lists              3.07 MB - 1.06x memory usage +0.182 MB
attributes_as_maps               3.86 MB - 1.33x memory usage +0.97 MB
read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
end

html_input = read_file.("medium.html")

[{"html", _, _} = html_attributes_as_lists | _] = Floki.parse_document!(html_input)
[{"html", _, _} = html_attributes_as_maps | _] = Floki.parse_document!(html_input, attributes_as_maps: true)

Benchee.run(
  %{
    "attributes_as_lists" => fn selector -> Floki.Finder.find(html_attributes_as_lists, selector)  end,
    "attributes_as_maps" => fn selector -> Floki.Finder.find(html_attributes_as_maps, selector)  end
  },
  time: 10,
  memory_time: 2,
  inputs: %{
    "bench" => "#cite_ref-Pocock1939_1-1"
  }
)

Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion to try, but LGTM!

lib/floki/selector.ex Outdated Show resolved Hide resolved
Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing :)

lib/floki/selector.ex Outdated Show resolved Hide resolved
@philss philss merged commit 18a2cf8 into philss:main Jan 4, 2024
9 checks passed
@ypconstante ypconstante deleted the optimize-id-matching branch January 4, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants