Skip to content

Commit

Permalink
Merge branch 'master' into 0.32+configurable-self-closing-tags-take-2
Browse files Browse the repository at this point in the history
* master: (23 commits)
  Bump ex_doc from 0.28.4 to 0.28.5 (philss#416)
  Bump earmark from 1.4.26 to 1.4.27 (philss#415)
  Bump credo from 1.6.5 to 1.6.6 (philss#414)
  Bump dialyxir from 1.1.0 to 1.2.0 (philss#413)
  Bump credo from 1.6.4 to 1.6.5 (philss#412)
  Show retrieval of data attributes (philss#410)
  Release v0.33.1
  Improve the README.md file
  Fix warnings related to the tokenizer
  Change sponsors links - using GitHub Sponsors
  Release v0.33.0
  Added case insentive variation of fl-contains (philss#409)
  Bump html5ever from 0.13.0 to 0.13.1 (philss#407)
  Bump earmark from 1.4.25 to 1.4.26 (philss#406)
  Minor stuff I changed while reading (philss#404)
  Bump earmark from 1.4.24 to 1.4.25 (philss#403)
  Bump html5ever from 0.12.0 to 0.13.0 (philss#401)
  Bump ex_doc from 0.28.3 to 0.28.4 (philss#402)
  Remove case of 2 for tuple of :pi (philss#400)
  Bump earmark from 1.4.23 to 1.4.24 (philss#398)
  ...
  • Loading branch information
inoas committed Aug 24, 2022
2 parents a418551 + 55d54a4 commit 6138a92
Show file tree
Hide file tree
Showing 18 changed files with 182 additions and 120 deletions.
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
github: philss
ko_fi: philipsampaio
custom: https://www.paypal.com/donate?business=EMDBKWVHVEB7Q&currency_code=USD
16 changes: 10 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
name: CI

on: [push, pull_request]
on:
pull_request:
push:
branches:
- master

jobs:
test:
Expand All @@ -13,12 +17,12 @@ jobs:
strategy:
fail-fast: false
matrix:
elixir: ["1.12.3", "1.8.2"]
otp: ["24.1", "22.3"]
elixir: ["1.13.3", "1.10.4"]
otp: ["24.3.2", "22.3.4"]
parser: [fast_html, html5ever, mochiweb]
exclude:
- elixir: "1.8.2"
otp: "24.1"
- elixir: "1.10.4"
otp: "24.3.2"

steps:
- uses: actions/checkout@v2
Expand All @@ -44,7 +48,7 @@ jobs:

- name: Check format
run: mix format --check-formatted
if: matrix.elixir == '1.12.3'
if: matrix.elixir == '1.13.3'

- name: Run test
run: |-
Expand Down
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,24 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased][unreleased]

## [0.33.1] - 2022-06-28

### Fixed

- Remove some warnings for unused code.

## [0.33.0] - 2022-06-28

### Added

- Add support for searching elements that contains text in a case-insensitive manner with
`fl-icontains` - thanks [@nuno84](https://github.com/nuno84)

### Changed

- Drop support for Elixir 1.8 and 1.9.
- Fix and improve internal things - thanks [@derek-zhou](https://github.com/derek-zhou) and [@hissssst](https://github.com/hissssst)

## [0.32.1] - 2022-03-24

### Fixed
Expand Down Expand Up @@ -596,7 +614,9 @@ of the parent element inside HTML.

- Elixir version requirement from "~> 1.0.0" to ">= 1.0.0".

[unreleased]: https://github.com/philss/floki/compare/v0.32.1...HEAD
[unreleased]: https://github.com/philss/floki/compare/v0.33.1...HEAD
[0.33.1]: https://github.com/philss/floki/compare/v0.33.0...v0.33.1
[0.33.0]: https://github.com/philss/floki/compare/v0.32.1...v0.33.0
[0.32.1]: https://github.com/philss/floki/compare/v0.32.0...v0.32.1
[0.32.0]: https://github.com/philss/floki/compare/v0.31.0...v0.32.0
[0.31.0]: https://github.com/philss/floki/compare/v0.30.1...v0.31.0
Expand Down
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

**Floki is a simple HTML parser that enables search for nodes using CSS selectors**.

[Check the documentation](https://hexdocs.pm/floki).
[Check the documentation 📙](https://hexdocs.pm/floki).

## Usage

Expand Down Expand Up @@ -61,7 +61,7 @@ Add Floki to your `mix.exs`:
```elixir
defp deps do
[
{:floki, "~> 0.32.0"}
{:floki, "~> 0.33.0"}
]
end
```
Expand Down Expand Up @@ -112,16 +112,14 @@ Extracting the files is needed only once.

#### Using `html5ever` as the HTML parser

Rust needs to be installed on the system in order to compile html5ever. To do that, please
[follow the instruction](https://www.rust-lang.org/en-US/install.html) presented in the official page.

After Rust is set up, you need to add `html5ever` NIF to your dependency list:
This dependency is written with a NIF using [Rustler](https://github.com/rusterlium/rustler), but
you don't need to install anything to compile it thanks to [RustlerPrecompiled](https://hexdocs.pm/rustler_precompiled/).

```elixir
defp deps do
[
{:floki, "~> 0.32.0"},
{:html5ever, "~> 0.9.0"}
{:floki, "~> 0.33.0"},
{:html5ever, "~> 0.13.0"}
]
end
```
Expand All @@ -136,7 +134,7 @@ Then you need to configure your app to use `html5ever`:
config :floki, :html_parser, Floki.HTMLParser.Html5ever
```

For more info, check the article [Rustler - Safe Erlang and Elixir NIFs in Rust](http://hansihe.com/2017/02/05/rustler-safe-erlang-elixir-nifs-in-rust.html).
Notice that you can pass the HTML parser as an option in `parse_document/2` and `parse_fragment/2`.

#### Using `fast_html` as the HTML parser

Expand All @@ -148,7 +146,7 @@ First, add `fast_html` to your dependencies:
```elixir
defp deps do
[
{:floki, "~> 0.32.0"},
{:floki, "~> 0.33.0"},
{:fast_html, "~> 2.0"}
]
end
Expand Down Expand Up @@ -259,9 +257,10 @@ Here you find all the [CSS selectors](https://www.w3.org/TR/selectors/#selectors

There are also some selectors based on non-standard specifications. They are:

| Pattern | Description |
|----------------------|-----------------------------------------------------|
| E:fl-contains('foo') | an E element that contains "foo" inside a text node |
| Pattern | Description |
|-----------------------|------------------------------------------------------------------------|
| E:fl-contains('foo') | an E element that contains "foo" inside a text node |
| E:fl-icontains('foo') | an E element that contains "foo" inside a text node (case insensitive) |

## Special thanks

Expand Down
2 changes: 1 addition & 1 deletion config/config.exs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use Mix.Config
import Config

config :logger, :console,
format: "$metadata $message\n",
Expand Down
4 changes: 4 additions & 0 deletions lib/floki.ex
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,8 @@ defmodule Floki do
iex> Floki.attribute([{"a", [{"class", "foo"}, {"href", "https://google.com"}], ["Google"]}], "a", "class")
["foo"]
iex> Floki.attribute([{"a", [{"href", "https://google.com"}, {"data-name", "google"}], ["Google"]}], "a[data-name]", "data-name")
["google"]
"""

@spec attribute(binary | html_tree | html_node, binary, binary) :: list
Expand All @@ -588,6 +590,8 @@ defmodule Floki do
iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "href")
["https://google.com"]
iex> Floki.attribute([{"a", [{"href", "https://google.com"}, {"data-name", "google"}], ["Google"]}], "data-name")
["google"]
"""

@spec attribute(binary | html_tree | html_node, binary) :: list
Expand Down
59 changes: 30 additions & 29 deletions lib/floki/html/tokenizer.ex
Original file line number Diff line number Diff line change
Expand Up @@ -746,36 +746,37 @@ defmodule Floki.HTML.Tokenizer do

# § tokenizer-script-data-escape-start-state: re-entrant

@spec script_data_escape_start(binary(), State.t()) :: State.t()
def script_data_escape_start(<<?-, html::binary>>, s) do
script_data_escape_start_dash(
html,
%{
s
| tokens: append_char_token(s, @hyphen_minus)
}
)
end

def script_data_escape_start(html, s) do
script_data(html, s)
end
## Unused
# @spec script_data_escape_start(binary(), State.t()) :: State.t()
# def script_data_escape_start(<<?-, html::binary>>, s) do
# script_data_escape_start_dash(
# html,
# %{
# s
# | tokens: append_char_token(s, @hyphen_minus)
# }
# )
# end

# def script_data_escape_start(html, s) do
# script_data(html, s)
# end

# § tokenizer-script-data-escape-start-dash-state

defp script_data_escape_start_dash(<<?-, html::binary>>, s) do
script_data_escaped_dash_dash(
html,
%{
s
| tokens: append_char_token(s, @hyphen_minus)
}
)
end
# defp script_data_escape_start_dash(<<?-, html::binary>>, s) do
# script_data_escaped_dash_dash(
# html,
# %{
# s
# | tokens: append_char_token(s, @hyphen_minus)
# }
# )
# end

defp script_data_escape_start_dash(html, s) do
script_data(html, s)
end
# defp script_data_escape_start_dash(html, s) do
# script_data(html, s)
# end

# § tokenizer-script-data-escaped-state

Expand Down Expand Up @@ -1740,15 +1741,15 @@ defmodule Floki.HTML.Tokenizer do
end

defp comment_end(html, s) do
new_comment = %Comment{s.token | data: [s.token.data | "--"]}
new_comment = %Comment{s.token | data: [s.token.data | ["--"]]}

comment(html, %{s | token: new_comment})
end

# § tokenizer-comment-end-bang-state

defp comment_end_bang(<<?-, html::binary>>, s) do
new_comment = %Comment{s.token | data: [s.token.data | "--!"]}
new_comment = %Comment{s.token | data: [s.token.data | ["--!"]]}

comment_end_dash(html, %{s | token: new_comment})
end
Expand All @@ -1772,7 +1773,7 @@ defmodule Floki.HTML.Tokenizer do
end

defp comment_end_bang(html, s) do
new_comment = %Comment{s.token | data: [s.token.data | "--!"]}
new_comment = %Comment{s.token | data: [s.token.data | ["--!"]]}

comment(html, %{s | token: new_comment})
end
Expand Down
37 changes: 21 additions & 16 deletions lib/floki/selector.ex
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ defmodule Floki.Selector do
defp classes(%{classes: classes}), do: ".#{Enum.join(classes, ".")}"
end

@wildcards [nil, "*"]
defguardp is_wildcard(x) when x in @wildcards

@doc false

# Returns if a given node matches with a given selector.
Expand Down Expand Up @@ -94,31 +97,29 @@ defmodule Floki.Selector do
end)
end

defp namespace_match?(_node, nil), do: true
defp namespace_match?(_node, "*"), do: true
defp namespace_match?(_node, namespace) when is_wildcard(namespace), do: true
defp namespace_match?(%HTMLNode{type: :pi}, _type), do: false

defp namespace_match?(%HTMLNode{type: type_maybe_with_namespace}, namespace) do
case String.split(type_maybe_with_namespace, ":") do
[ns, _type] ->
ns == namespace
[^namespace, _type] ->
true

[_type] ->
_ ->
false
end
end

defp type_match?(_node, nil), do: true
defp type_match?(_node, "*"), do: true
defp type_match?(_node, type) when is_wildcard(type), do: true
defp type_match?(%HTMLNode{type: :pi}, _type), do: false

defp type_match?(%HTMLNode{type: type_maybe_with_namespace}, type) do
case String.split(type_maybe_with_namespace, ":") do
[_ns, tp] ->
tp == type
[_ns, ^type] ->
true

[tp] ->
tp == type
[^type] ->
true

_ ->
false
Expand All @@ -131,11 +132,10 @@ defmodule Floki.Selector do
defp classes_matches?(%HTMLNode{attributes: []}, _), do: false

defp classes_matches?(%HTMLNode{attributes: attributes}, classes) do
Enum.all?(classes, fn class ->
selector = %AttributeSelector{match_type: :includes, attribute: "class", value: class}

AttributeSelector.match?(attributes, selector)
end)
case :proplists.get_value("class", attributes, nil) do
nil -> false
class -> classes -- String.split(class, ~r/\s+/) == []
end
end

defp attributes_matches?(_node, []), do: true
Expand Down Expand Up @@ -210,6 +210,11 @@ defmodule Floki.Selector do
PseudoClass.match_contains?(tree, html_node, pseudo_class)
end

# Case insensitive contains
defp pseudo_class_match?(html_node, pseudo_class = %{name: "fl-icontains"}, tree) do
PseudoClass.match_icontains?(tree, html_node, pseudo_class)
end

defp pseudo_class_match?(html_node, %{name: "root"}, tree) do
PseudoClass.match_root?(html_node, tree)
end
Expand Down
Loading

0 comments on commit 6138a92

Please sign in to comment.