-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to get inner/outer HTML in WP_HTML_Tag_Processor #60046
Comments
Weirdly enough, the interactivity API does now has a class with methods that is build on top of the Tag Processor that has the description, seems they use it to extract the HTML. I have not looked deeply into this, but when I saw it, I thought: why not build a general purpose method right into the Tag Processor. There might be reasons, and I think there is a plan for bringing more functionality into the HTML API. get_content_between_balanced_template_tags Not sure what balanced means. |
@dmsnell, can you provide the technical feedback? |
Thanks for the inquiry @bfintal. If you follow the broad roadmap for the HTML API, you will note that functions like The Interactivity API is a kind of test-bed for this work, even though hopefully in the 6.6 release cycle the custom parser will be replaced with the HTML Processor. "Balanced" is a common idea for matching tag content. The idea is that if we assume that an HTML document always has an opening and closing tag for each element, then we can parse with a simple stack. This works reasonably well in practice, but still fails in a number of common edge cases. For example, among the web's highest-ranked pages, many closing The HTML Processor incorporates the rules in the HTML5 specification so that nobody will need to worry about when an element is opened and closed. The funny thing is that its logic ends up being much simpler than all the over-simplified attempts: while ( $processor->next_token() && $processor->still_open( $opening_tag ) ) {
continue;
} This aside, there still remains open questions about how to represent inner and outer HTML relating to escaping, decoding, and composition. I encourage people to explore the existing interfaces and to share feedback in #core-html-api, but please be warned against building structural parsers for production: it's almost impossible to know what is and isn't inner HTML without implementing the semantic rules of HTML5.
Good news! in WordPress 6.5 this is even easier, because the introduction of the while ( $processor->next_tag( 'STYLE' ) ) {
$contents = $processor->get_modifiable_text();
analyze_style( $contents );
} Unfortunately there's no support yet for modifying the modifiable text. If you want to do that, come join us in Slack and we can discuss how to do it, or link to a PR in your project and I'd be happy to review. I'm going to close this issue because: we already plan on adding inner/outer HTML support, but not yet; and HTML API development is tracking in the linked discussion and on Core Trac. Feel free to continue responding. |
What problem does this address?
With the WP_HTML_Tag_Processor, you can get an attribute, the tag name, but there is no way to get the innerHTML and outerHTML. The class is great for traversing HTML and it would be great if it can be used as an alternative to regex for grabbing html content.
Scenario: right now I'm using the
render_block
to grab some contents of some<style>...</style>
tags via regex.What is your proposed solution?
Add a method
get_inner_html
andget_outer_html
that would return the inner and outer html where the current "pointer" is at.If added, I should now be able to do:
The text was updated successfully, but these errors were encountered: