Skip to content

Commit

Permalink
attr name support all w3c defined characters
Browse files Browse the repository at this point in the history
fix #95,#100
  • Loading branch information
yaniswang committed May 1, 2016
1 parent a6d57f6 commit b6a4d1a
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 6 deletions.
5 changes: 3 additions & 2 deletions CHANGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ add:

fix:

1. fix: report error evidence if tag attrs include `\r\n`
2. fix: space-tab-mixed-disabled issue #119
1. report error evidence if tag attrs include `\r\n`
2. space-tab-mixed-disabled issue #119
3. attr name support all w3c defined characters

## ver 0.9.10 (2015-10-12)

Expand Down
2 changes: 1 addition & 1 deletion lib/htmlhint.js

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions src/htmlparser.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ var HTMLParser = (function(undefined){
var self = this,
mapCdataTags = self._mapCdataTags;

var regTag=/<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([\w\-:]+)((?:\s+[\w\-:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?))>/g,
regAttr = /\s*([\w\-:]+)(?:\s*=\s*(?:(")([^"]*)"|(')([^']*)'|([^\s"'>]*)))?/g,
var regTag=/<(?:\/([^\s>]+)\s*|!--([\s\S]*?)--|!([^>]*?)|([^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+)((?:\s+[^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'>]*))?)*?)\s*(\/?))>/g,

This comment has been minimized.

Copy link
@felixfbecker

felixfbecker May 1, 2016

@yaniswang This is not correct! Element tag names and attribute names do not share the same restrictions, see http://www.w3.org/TR/html-markup/syntax.html#tag-name:

HTML elements all have names that only use characters in the range 0–9, a–z, and A–Z.

However, many frontend frameworks also use hyphens and colons in tag names, so the previous regTag was totally fine.

If you want to accept more tag names, you could allow all valid XML names, from http://www.w3.org/TR/xml/#NT-NameStartChar:

[4]    NameStartChar    ::=    ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] 
[4a]   NameChar         ::=    NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5]    Name             ::=    NameStartChar (NameChar)* 

that would be

/[A-Z_a-z:\xC0-\xD6\xD8-\xF6\u0028-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD][\w:\-.\xC0-\xD6\xD8-\xF6\u0028-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD\u0300-\u036F\u203F-\u2040]*/

Which comes pretty close to the old regex [\w\-:]+ (alphanumeric characters, hyphens, colons).
Please next time comment on a PR before calling out someone's code as bad and "fixing" it without looking at the actual HTML spec...

This comment has been minimized.

Copy link
@yaniswang

yaniswang May 1, 2016

Author Contributor

67c8922

Tks, just fixed.

This comment has been minimized.

Copy link
@felixfbecker

felixfbecker May 1, 2016

Nice, any ETA for release?

This comment has been minimized.

Copy link
@yaniswang

yaniswang May 1, 2016

Author Contributor

Soon

This comment has been minimized.

Copy link
@yaniswang

yaniswang May 2, 2016

Author Contributor

Just published: https://www.npmjs.com/package/htmlhint

And the site is up to v0.9.13 too: http://htmlhint.com/

regAttr = /\s*([^\s"'>\/=\x00-\x0F\x7F\x80-\x9F]+)(?:\s*=\s*(?:(")([^"]*)"|(')([^']*)'|([^\s"'>]*)))?/g,
regLine = /\r?\n/g;

var match, matchIndex, lastIndex = 0, tagName, arrAttrs, tagCDATA, attrsCDATA, arrCDATA, lastCDATAIndex = 0, text;
Expand Down
10 changes: 9 additions & 1 deletion test/htmlparser.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -289,9 +289,17 @@ describe('HTMLParser: Object parse', function(){
expect(attrAlt.name).to.be('alt');
expect(attrAlt.value).to.be('abc');
expect(attrAlt.quote).to.be("");
var attrAB = arrEvents[1].attrs[3];
expect(attrAB.name).to.be('a.b');
expect(attrAB.value).to.be('ccc');
expect(attrAB.quote).to.be("");
var attrCD = arrEvents[1].attrs[4];
expect(attrCD.name).to.be('c*d');
expect(attrCD.value).to.be('ddd');
expect(attrCD.quote).to.be("");
done();
});
parser.parse('<img width="200" height=\'300\' alt=abc>');
parser.parse('<img width="200" height=\'300\' alt=abc a.b=ccc c*d=ddd>');
});

it('should parse end tag', function(done){
Expand Down

0 comments on commit b6a4d1a

Please sign in to comment.