Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlHandler, for normalizing tag cases #24

Closed
wants to merge 471 commits into from

Conversation

kirbysayshi
Copy link

As I thought through #20 and #22, I realized that the problem was not with the parser itself, but rather the results the parser created. Rather than hacking on the parser and breaking things like RSS/XML support, I decided a better approach would be to create another handler, called HtmlHandler. It embraces the case-insensitive nature of html tags, and toUpperCase()'s all tag names to respect the standard. When reserializing, the printHtml method (provided by tomdz) now toLowerCase()'s all tags, because it's printing HTML, not XML/RSS.

I've updated all tests, as well as added a few to test for scenarios where tags have mixed cases. This fork is currently in production on https://citational.com.

Please let me know any thoughts, as I'm more than willing to hear alternate opinions!

fb55 and others added 30 commits June 2, 2012 20:45
fix of htmlparser.DomUtils.getOuterHTML for directives
yep, it's insanely short
to get a signal when there won't be any more attributes coming
they are now available as `domhandler`
'case numbers are faster to compare

NOT breaking due to last commit
Attention: The DOM changes slightly.
…quoted attribute values.

Require self-closing tags to be void
…g the attributes count. Here's a different way to accomplish the same thing.
This reverts commit 181c31b.
…close is implied by other tags being opened, and these are closed when those tags are opened. This helps correctly parse things like lists and tables with unterminated LI or TD tags.
fb55 added 29 commits August 18, 2013 20:07
also replaced call to `Array#slice` with setting the stack's `length`
property
failed previously (only for FeedHandler tests), fixed now due to
DomHandler upgrade (which removed the `ignoreWhitespace` option)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants