Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function like findRE that will return all sub/group matching. #10594

Closed
razonyang opened this issue Jan 1, 2023 · 4 comments · Fixed by #10626
Closed

Add a function like findRE that will return all sub/group matching. #10594

razonyang opened this issue Jan 1, 2023 · 4 comments · Fixed by #10626
Assignees
Milestone

Comments

@razonyang
Copy link
Contributor

razonyang commented Jan 1, 2023

Related topic: https://discourse.gohugo.io/t/regular-expressions-sub-matching-groups-not-supported/15368

I would like to create functions that parse and return some stuff of HTML like following.

<ul>
  <li><a href="#foo">Foo</a></li>
  <li><a href="#bar">Bar</a></li>
</ul>
pattern := `<a.*href="(.+)">(.+)</a>`

Result (something like this)

[
  [
    "<a href=\"#foo\">Foo</a>",
    "#foo"
    "Foo"
  ],
  [
    "<a href=\"#bar\">Bar</a>",
    "#bar"
    "Bar"
  ]
]
@razonyang razonyang changed the title Add a function like findRE that will return all sub-matching. Add a function like findRE that will return all sub/group matching. Jan 1, 2023
@jmooring
Copy link
Member

jmooring commented Jan 1, 2023

I believe that FindAllStringSubmatch provides the desired functionality, but I am concerned that site developers will have a difficult time wrapping their heads around this.

For your specific example, you can do this:

{{ $str := `
  <ul>
    <li><a href="#foo">Foo</a></li>
    <li><a href="#bar">Bar</a></li>
  </ul>
`}}


{{ $s := slice }}
{{ range $match := findRE `<a.+a>` $str }}
  {{ $s = $s | append (
    slice (
      slice
        $match
        (replaceRE `.+"(.+)".+` "$1" $match)
        (replaceRE `.*>(.+)<.+` "$1" $match)
      )
    )
  }}
{{ end }}

The resulting data structure (jsonified)...

[
  [
    "\u003ca href=\"#foo\"\u003eFoo\u003c/a\u003e",
    "#foo",
    "Foo"
  ],
  [
    "\u003ca href=\"#bar\"\u003eBar\u003c/a\u003e",
    "#bar",
    "Bar"
  ]
]

@razonyang
Copy link
Contributor Author

@jmooring Thank you for the workaround, but I'm concerned about the performance between FindAllStringSubmatch and the findRE + N x replaceRE. I think it is worthwhile if the former has obvious performance. But I don't know how to benchmark.

but I am concerned that site developers will have a difficult time wrapping their heads around this.

I think it's related to the documentations.

@bep
Copy link
Member

bep commented Jan 17, 2023

I don't think we need to benchmark to to show the value, and I have needed this myself in some cases.

@bep bep self-assigned this Jan 17, 2023
@bep bep added this to the v0.109.0 milestone Jan 17, 2023
bep added a commit to bep/hugo that referenced this issue Jan 17, 2023
bep added a commit to bep/hugo that referenced this issue Jan 17, 2023
bep added a commit that referenced this issue Jan 17, 2023
@github-actions
Copy link

github-actions bot commented Feb 8, 2023

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants