Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML: better support for template syntax #35

Closed
stroborobo opened this issue Jul 27, 2015 · 23 comments
Closed

HTML: better support for template syntax #35

stroborobo opened this issue Jul 27, 2015 · 23 comments

Comments

@stroborobo
Copy link

Hi,

I just noticed that if I have a "Type" field in my data struct and use minify for my go template files, it lowercases "Type".

So this:

<option value="0" {{ if eq .Type 0 }}selected{{ end }}>Foo</option>

Becomes this:

<option value=0 {{ if eq .type 0 }}selected{{ end }}>Foo</option>

And it seems like it's not possible to use text/template and minify for a javascript file together, does it? I know, that's probably quite an edge case :)

EDIT: Just tested this, doesn't work unfortunately, but I could still use minify as a package and minify the generates JS, so that's not too much of a problem for my case.

@tdewolff
Copy link
Owner

That's correct, the minifiers work according to the specs, and templates are really a syntax layer over original file format. You could make a preprocessor to skip everything between double brackets, but you have to somehow reinsert them after minifying. Like using a unique ID for replacement and afterwards substitute back.

I'm still thinking about a better solution, but it's not as easy as it looks...

@tdewolff tdewolff changed the title Better support for go template syntax? HTML: better support for Go template syntax Sep 21, 2015
@tdewolff tdewolff changed the title HTML: better support for Go template syntax HTML: better support for template syntax Jan 28, 2016
@tdewolff
Copy link
Owner

Some thoughts I needed to write down:

It would be an awesome feature if the minifier (and then specifically the HTML minifier) could handle Go templates correctly (ignore {{ ... }}). But it would be even better to make that more general so it would ignore common constructs such as <?php ... ?>, <% ... %>, {{ ... }}, etc.

I don't really want to embed that in the HTML parser so it can stay spec compliant and doesn't get cluttered-up with exceptions. Rather there should be a construct where you can wrap the HTML minifier in a 'template-tag-ignorer' or the other way around, so that the user can decide which tags should be omitted but the design stays modular.

Not only should this added module omit template tags from the input (easy) but it should also write the ignored parts to the output at the right point because output often lags behind input (hard). The way I see it the options are:

  • Add the template tag handling to the HTML parser and minifier (bad modularisation)
  • Replace template tags by a unique symbol and replace those symbols with the original content after minification (slow and ugly)
  • Somehow synchronize input and output so that whenever it ignores a template tag for the HTML parser it writes it to the output too (hard)

The problem with the last option is that it is hard because writing to output always lags behind reading from input, but do we know by how much? If we know that the template tag is at the front of the input buffer (not somewhere further on in the buffer due to peeking), than we can safely assume that we can immediately write to output. Does this hold when the template tag is in an attribute name or value?

Anyone any ideas?

Another problem is the following (awful) example: <span attr="data<?php echo 'data"'; ?>> resulting in <span attr="datadata">, but if you ignore the PHP part (which is what the HTML processor would do), it doesn't know that the attribute has ended. This is not solvable and probably rare, so it is a limitation (which is alright I think).

@omeid
Copy link

omeid commented Jan 28, 2016

Shouldn't one minify the result of templates instead of templates to begin with?
I think lots of optimization maybe missed if templates are minified instead of the actual compiled html.

@tdewolff
Copy link
Owner

Yes, I think so too especially since minification is so fast. But if you would minify templates you only have to minify once (and accept that the inserted HTML from the template will not be minified) and not after generation (typically per client request).

I don't think this one has a high priority, but still it would be a great feature nonetheless.

@lpar
Copy link
Contributor

lpar commented Apr 28, 2016

I don't understand why the minifier is lowercasing .Type in the original example. Surely that's a bug? If the file is an XML flavor of HTML, attributes are case-sensitive; if it's HTML5, custom data attributes are case sensitive.

@tdewolff
Copy link
Owner

tdewolff commented Apr 28, 2016

XML flavor should be parsed by the XML minifier, the HTML minifier is for HTML5 only.

The HTML5 spec specifies that tags and attributes are case-insensitive. It also appears that data tags are case-insensitive when defining them. Accessing them through jQuery is case sensitive though, in the way that it has to be all lower-case or camel case. This means that the attribute definition is internally first converted to lower-case before being matched by jQuery. Lowering case improves gzip compression rates.

For example, try this in the W3 validator:

<!doctype html>
<html>
<head>
<title>test</title>
</head>
<body>
  <div data-one="a" data-ONE="b"></div>
</body>
</html>

it will say

Error: Duplicate attribute data-one.

@zengabor
Copy link

zengabor commented May 3, 2018

Commented out these 5 lines which fixed the issue for me as I don’t need lowercasing at all: https://github.com/tdewolff/parse/blob/master/util.go#L12-L16

Thanks for the amazing project!

@alex-bacart
Copy link

@tdewolff sorry for commenting on a closed issue. What do you think about adding new option to Minifier? Something like DontLowercase bool it will be really helpful and would allow to write such a code

<html lang="{{.Lang}}">

@tdewolff
Copy link
Owner

That'll probably solve the problem. I'm going to look into supporting this soon!

@alex-bacart
Copy link

alex-bacart commented Jul 4, 2020

@tdewolff can you please take a look at my PR?
I really don't like the name of an option DontLowercaseAttributes so maybe you'll pick a better one.

@infogulch
Copy link

infogulch commented Jul 24, 2023

How about a multi-pass solution like this:

  1. Parse the original file with template parser into an AST
  2. Walk the AST to find the byte spans with "special" template logic, collect the spans into a big list
  3. Pass the original file plus the list of spans to the minifier
  4. The minifier trims bytes as if the indicated spans don't exist, but keeps track of where they would have gone.
    a. I suspect this can be done with basic (albeit tedious) byte counting
  5. When the minifier emits the minified html it re-inserts the template spans from the original file unmodified into the output at the right location
  6. The result is then fed back into the template parser and used like normal

The minifier API might look something like this: m.BytesPreserveSpans("text/html", b, []Span{{24,42},{106,301},...})

I think this may be the only "correct" way of doing it. If we're minifying just once at program startup it's not a big deal to do multiple passes.

@tdewolff
Copy link
Owner

tdewolff commented Jul 27, 2023

Hey @infogulch , thanks for the effort! I believe the problem is a bit more difficult. For example, if your input is:

{{ $foobar := 'attr-value">' }}
<p attr="{{ print $foobar }} p-tag-content</p>

You'll get into trouble at the HTML parser anyways, as it'll believe that p-tag-content is part of the <p> attribute! This is not fixed with a multi-pass solution, which actually I believe can all happen in a single-pass anyways (also for performance considerations).

I've been thinking recently due to your post, that perhaps the best solution is adding a bit of template support to the HTML parser. It is a common enough problem to warrant the changes to the HTML parser, and it would actually be very little work. Principally, we could detect template tags ({{}}, <?php ?>, <% %>) only inside the content of attributes and the content of tags. I believe this is that the Go template language enforces anyways? When encountered, we could just skip ahead till the end of the template tag. I'm pretty confident that would solve 99% of the cases, what is your take on this?

@infogulch
Copy link

As far as your first example goes, that's not valid for Go's html/template ( https://go.dev/play/p/pehr0IaMZTo
), and even attempting to solve that problem runs straight into the halting problem. In my opinion if your template code breaks across html syntax boundaries then there's no reasonable expectation for any other tool to be able to understand it, and you deserve whatever XSS hell you dig yourself into. 🤣😬

Yes I think some basic built-in template support would be a great idea. My only concern would be if actually supporting template syntax happens to be more complicated than we expect. Note that go's template allows overriding the template start/end tags {{, }} to whatever you want.

@zengabor
Copy link

For inspiration, I’m now using html-minifier, which may be slower than this library, but it works excellently with my templates if using --ignore-custom-fragments '/{{[{]?.*?[}]?}}/'. Maybe you could consider something similar and put the responsibility on the users?

@tdewolff
Copy link
Owner

tdewolff commented Oct 30, 2023

After 8 years I've finally come around to implement this, thanks for the patience!

In order to use this, you should try:

m.Add("text/html", &html.Minifier{
    TemplateDelims: html.GoTemplateDelims,
})

@infogulch
Copy link

That's great! If I'm reading correctly the strategy is to skip minifying syntax elements that contain the template delims, is that right?

@tdewolff
Copy link
Owner

Exactly, there isn't anything meaningful to do when templating gets involved, as anything can happen. Potentially we could minify text and attribute values surrounding the inner templates, but not sure if it's worth it...

@infogulch
Copy link

You mean like:

<option value="0" {{ if eq .Type 0 }}selected   style="  display:   inline ; "   {{ end }}>Foo</option>

<option value="0" {{ if eq .Type 0 }}selected style="display:inline;"{{ end }}>Foo</option>

@tdewolff
Copy link
Owner

Exactly, the output will be:

<option value=0 {{ if eq .type 0 }}selected style=display:inline {{ end }}>Foo

That is, the selected isn't minified, the style is (the parser doesn't look at the content of the templates, and is template language agnostic)

@infogulch
Copy link

we could minify text and attribute values surrounding the inner templates

To clarify, you mean that content ~1 token away from a template section is skipped and not minified.

In this case:

<option value="0" {{ if eq .Type 0 }}selected   style="  color:   green ; "   {{ end }}>Foo</option>

... is minified to:

<option value=0 {{ if eq .Type 0 }}selected style=color:green {{ end }}>Foo

In particular, selected is not minified because it's immediately adjacent to a template section (though there's nothing to minify in this case), and just after the style retains one space token because there was a space in the original text. This seems reasonable to me. (Not to talk in circles, I'm just trying to understand.)

What about a case with indentation whitespace?

		<ol id="feeds-list">
			<li><a href="/">All Feeds</a></li>
			{{- range .Feeds}}
			{{- block "feed" .}}
			<li>
				<a href="/feed/{{.id}}/">
					{{- if .image}}<img src="{{.image}}">{{- else}}<span>{{.title | substr 0 1 | upper}}</span>{{end}}
					{{- .title -}}
				</a>
			</li>
			{{- end}}
			{{- end}}
		</ol>

I'm guessing it will just collapse the LF+indentation into just LF, like this:

<ol id="feeds-list">
<li><a href="/">All Feeds</a></li>
{{- range .Feeds}}
{{- block "feed" .}}
<li>
<a href="/feed/{{.id}}/">
{{- if .image}}<img src="{{.image}}">{{- else}}<span>{{.title | substr 0 1 | upper}}</span>{{end}}
{{- .title -}}
</a>
</li>
{{- end}}
{{- end}}
</ol>

(Of course Go's use of - to strip whitespace next to a template section would complicate the final rendered output, but I think we can focus on what minify does to the template text before being parsed by html/template.)

@tdewolff
Copy link
Owner

You are correct. Basically <option value="0" {{ if eq .Type 0 }}selected style=" color: green ; " {{ end }}>Foo</option> is seen as <option value="0" {{***}}selected style=" color: green ; " {{***}}>Foo</option>, and thus we do minify the style attribute.

In your second example, it will disable minifying when the content has a template, so <a href="/feed/{{.id}}/"> Click here {{.title | substr 0 1 | upper}} </a> and <script> var a = '{{.}}';</script> will both keep their content as-is.

What I meant was that we could replace {{***}} with some unique identifier, minify, and replace the identifier with the original template. This would allow us to minify <script> var a = '{{.}}';</script> to e.g. <script> var a = 'OWHJNM728';</script> => <script>var a='OWHJNM728'</script> => <script>var a='{{.}}'</script>. This is a bit complicate however...

@infogulch
Copy link

Thank you for clarifying, that all makes sense.

I agree that replacing with an identifier could be a bit messy. It might be cleaner if you replace it with a single character like A and kept track of the positions of deleted characters (I think minify only deletes characters, is that right?), then project through the relocations to replace back. This might be more accurate, though probably not less complicated.

@tdewolff
Copy link
Owner

tdewolff commented Nov 2, 2023

True, or use the \uFFFC Unicode object replacement character for that. The problem is that after replacing the templates e.g. in a piece of JavaScript, we then send it to the JS minifier which expect valid JS. It isn't trivial to create a unique identifier or replacement character that results in valid JS and cannot collide with existing definitions in the script. The same for any other mimetype/minifier...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants