Git Product home page Git Product logo

Comments (30)

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024 1

So, we went back to html-minifier to use it's regexp based option,ingnoreCustomFragments like before. We are now making the regexp lazy : [/\${[\s\S]*?}/]. This will work for all general cases but will fall short for something like:

<div> ${ obj | map: { propName: x, append: 3 } } </div>

We (developers) can then use <!--- htmlmin:ignore --> to make the template parser safe like:

<!--- htmlmin:ignore -->
<div> ${ obj | map: { propName: x, append: 3 } } </div>
<!-- htmlmin:ignore --> 

This is a feature html-minifer provides natively.

This is not the ideal solution. But, we want to "fix" it on the html-minifier side (under a feature flag) as the library (html-minifier) is used in html-loader in Webpack and in Aurelia CLI besides bundler and the problem is universal in that respect. We want to be consistent across the solutions.

Will push a release out sometime tomorrow. Thank for all the suggestions and patience.

from bundler.

JeroenVinke avatar JeroenVinke commented on June 18, 2024

In the CLI we changed the regex to be lazy which fixed a similar error: aurelia/cli@b15199f#diff-67b223207feed0a4fe6155b8c200651dR23

If parsing is more resilient I'd like to consider using that instead. Do you have some examples for which we need to parse the file?

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Making the expression lazy does make it less likely to break things, but it won't really fix it.
The thing is, with lazy matching, you can miss part of the expression (gray part is what it would match):

${value ? "}" : foo<bar}

${value | converter1: { a: 42 }| converter2: foo<bar}

I know those are kinda contrived examples, but we did encounter such issues in several places in our projects, which is why the gulp-translate plugin now uses a proper parser :-)

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@thomas-darling Sorry about the bad release. Let's go back to using the previous version. We will let you know once we fix that problem.

from bundler.

 avatar commented on June 18, 2024

I'm +1-ning this. Its nice you fixed #159 for me but now we are getting the exact same erros as @thomas-darling haha

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@MaartenThomassen @thomas-darling Could you try v0.6.4 out and let us know if it fixes the issue.

Realised it's basically a paren balancing problem and regex is not particularly good at that. Replaced html-minifier with minimize as html-minifirer doesn't provide any other option then regex to handle the issue. Minimize on the other hand supports template literal fragments out of the box and also provides this nice plugin api for granular control.

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

I don't believe this will resolve the issue - in the minimize readme it explicitly says:

Minimize does not parse inline PHP or raw template files. Templates are not valid HTML and this is outside the scope of the minimize. The output of the templaters should be parsed and minified.

And it's build on htmlparser2, which is also what I use in the gulp-translate plugin referenced in the description of this issue - it will break when presented with certain binding expressions.

The only way to solve this, is to parse the expressions in the document and replace them with parser-safe placeholders, before the HTML parser sees it - and then, after the minification, you can inject the expressions back into the minified document by replacing those placeholders.

It's also not just a simple balancing problem, as you may have a } inside a string, as in one of my examples above. The quotes delimiting a string is also not just a balancing problem, as you may have escaped quotes in there - "This quote does not end the string \"".

The gulp-translate code I referenced in the issue description is pretty much a drop-in solution for this - and you're very welcome to use it if you want :-)

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@thomas-darling Well, I ran some tests that included your examples with the escaped quote one( \"), It seems to work.

Aurelia template is valid html (well, almost), hence htmlparse2 should theoretically be able parse it. It may not be able to parse inline php or other templates like jade or the likes but that's fine. We are not solving a general problem here. It is good enough if it can minimize Aurelia template.

Having said that, if this approach does not work, I will gladly use the solution you referenced.
Thanks for the effort you put in here to explain the problem.

from bundler.

weagle08 avatar weagle08 commented on June 18, 2024

EDIT: SORRY, this was with version 0.6.0. Update to v0.6.4 and it seems to fix the problem!
THANKS!

I'm seeing this as well with:

<div class="form-group ${hasValidationRun == false ? '' : (isValid == true ? 'has-success' : (optional == true ? 'has-warning' : 'has-error'))}">

cause the following:

[09:46:35] 'bundle' errored after 8.81 s
[09:46:35] Error on fetch for app/components/validated-input/validated-input.html!github:systemjs/[email protected] at file:///C:/dev/FDS-Client/src/www/app/components/validated-input/validated-input.html!file:///C:/dev/FDS-Client/src/www/jspm_packages/github/systemjs/[email protected]
Error: Parse Error: <div class="form-group 3citq0spuv70 </span></div></template> at new HTMLParser (C:\dev\FDS-Client\src\node_modules\html-minifier\src\htmlparser.js:236:13) at minify (C:\dev\FDS-Client\src\node_modules\html-minifier\src\htmlminifier.js:861:3)

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Here's an example of a case that breaks:

var htmlparser = require("htmlparser2")

var parser = new htmlparser.Parser({
    onopentag: function(name, attribs){
        console.log("open: "+name);
    },
    ontext: function(text){
        console.log("text: " + text);
    },
    onclosetag: function(name){
        console.log("close: "+name);
    }
});
parser.write('<div> A ${ foo <= bar ? 1 : 2} B </div> C');
parser.end();

Run that on https://npm.runkit.com/htmlparser2 and you get this:

open: div
text: A ${ foo 
open: =
text: C
close: =
close: div

Note that the text B is gone, probably because it was parsed as an attribute, a closing tag for = has been auto-inserted, and other content is missing and moved around. When this is serialized again, it will result in, among other things, an unwanted </=>, which will break things - and that's just one way this can silently break the templates.

--

Or the same example with minimize:

var Minimize = require('minimize');

var result = new Minimize()
  .parse("<template><div> A ${ foo <= bar ? 1 : 2} B </div> C</template>");

Running that on https://npm.runkit.com/minimize produces this:

<template><div>A ${ foo<= 1 bar ? : 2} b < div>C</=></div></template>

That's one messed up template!
So, you definitely need a full parser here :-)

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

@ahmedshuhel Any news on this? :-)

from bundler.

EisenbergEffect avatar EisenbergEffect commented on June 18, 2024

@ahmedshuhel Can we get a fix out for this please?

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@thomas-darling Yeah, I can see the problem here. And, the sad part is that the template below is fine:
<template><div> A ${ foo < bar ? 1 : 2} B </div> C</template>
I removed the = from your example template that fails. It's just when we add the = after < htmlparser2 panics. Implementing a full parser is a non trivial job and it's hard to digest the fact that we have to implement one just because of a =.
I am currently evaluating the repo (gulp-translate) you linked to see if/how we can use that. Also, htmlparser2 has a thing called handler. By default it uses domhandler. There is also a FeedHandler impl that inherits form domhandler. May be and I am not sure, we can get away with this by implementing an AureliaHandler that teaches htmlparser2 how to handle/parse Aurelia template.

from bundler.

gheoan avatar gheoan commented on June 18, 2024

This can be worked around by disabling minification or by using the innerText attribute:

<div inner-text=" A ${ foo <= bar ? 1 : 2} B "></div> C

Also, if there is a custom HTML parser to be made maybe the aurelia templating engine could be used to remove the aurelia syntax from the template, minify the the remaining html, then add the aurelia syntax back.

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

@ahmedshuhel Yeah, implementing a full parser is no fun, but I don't think there's any way around it.
Might actually make sense to add this capability of making templates parser-safe to the core of Aurelia - might be needed there too, and it would keep the implementation of the syntax in one place...

Keep in mind that <= is not the only thing that will cause an issue. You will encounter the same issue if you have something that looks like an HTML tag or entity code inside a string in a binding expression, or if you have something like ${a<b?"foo":"bar"} - the use of spacing around < is a question of code style, and the lack of spacing should not results in invalid templates after bundling.

I don't think there's any reasonable way to deal with this using a handler - and even if that was a possibility, you would now have taken a strong dependency on htmlparser2, which means you can't swap out the minifier later. With the extract-parse-inject approach I recommend, you are free to process the templates using any parser or minifier you want, as they will already be standards compliant and parser-safe when the parser sees them.

Also note that the parser required to extract the expressions might not necessarily need to parse the expressions into expression trees. If you look at the code for gulp-translate, you can see that it just need to replace the expressions with placeholders - and for that, you just need to deal with balancing curlies, quotes, and the escape character inside a pair of quotes.
Or at least so I thought when I wrote that code...

Thinking further about this, it becomes clear that it can actually still break...
You could have this template, which would actually trip up my extraction code in gulp-translate:

<div>${ not an expression</div><span>still not an expression }</span>

After extraction, this would result in HTML that looks like <div>${0}</span>, while the expression array would look like ["${ not an expression</div><span>still not an expression }"] - and that's obviously wrong and not parser safe. However, I think this could reasonably be considered an unsupported edge case, for which the workaround would be to simply HTML encode the $ character as &#36;, such that the ${ won't be mistaken as the start of an expression.

So yeah, an expression parser is clearly needed to do this safely - and now I'm actually wondering to what extend this might be an issue in Aurelia itself, when running in the browser? It just relies on the parser in the browser to turn a raw template string into a DOM tree, right?
In that case, I'm pretty sure that can fail too, in similar ways...

@EisenbergEffect, maybe you have some input on this?

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@thomas-darling For parsing expressions we may be able to use the parser used in Aurelia Templating. So even if it has bugs like the one you mentioned above, it will be consistent and solving there would solve it here. But, I am a bit unsure about the extract-parse-inject approach here in minification context. After extraction we will pass the template to minifier and get a minified version of it and we don't have any control (except for the options that minimize supports) over this and how it will look like eventually. The tags in the template may/will shift from their initial position, it may remove redundant tags attributes etc. So, I am bit unsure how it will treat the placeholders we left for the extracted template literals before minification. Will that be consistent and valid html after we insert them back replacing the placeholders. In case of gulp-translate, the structure of the template does not change, I suppose. Which is much predictable. I don't have the data to support this yet but again we my still leak one or two edge cases like this.

I am wondering whither we should take the parsing route? Is it that big of an issue that deserves this much of effort? As @gheoan mentioned there are way arounds for cases like this. (In addition to his suggestion one could extract a function in view model and use that instead of conditional ternary operator)

So, yeah, there is a judgement call to make here for @EisenbergEffect .

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Regarding reusing the Aurelia parser, it's not enough for the bugs to be consistent, nor is it achievable. If the minifier gets invalid HTML, it will mess it up by inserting closing tags, etc. - and then after we inject expressions and potentially rouge tags after minification, the parser in the browser will be even more confused than the minifier.

That said, I'm 100% confident my approach in gulp-translate works for minification too. The minifier won't remove or change text nodes (other than collapsing white space), so if we just replace ${some expression} with e.g. ${0}, then there should be no problem restoring those after minification - only way it could possibly fail, is if the minifier were to change the text ${0} to something else, and in that case, it would be a pretty broken minifier as that is perfectly valid text content, and as short as it gets.

The only edge case here, is the minor issue I mentioned in my previous comment - but I'd consider it highly unlikely that anyone actually need the text ${ in a template, and if you do, then chances are you will need to HTML encode that $ character anyway, as Aurelia itself will otherwise get all confused too.

To make my position on this crystal clear, I don't see any viable option here, other than the one I'm suggesting - and if this is not handled properly, we will be forced to either fork or abandon aurelia-bundler and aurelia-cli, as unexpected and silent template corruption is, in our view, simply unacceptable.

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Just to clarify: Reusing the parser in Aurelia, if that is technically feasible, which I doubt, might help ensure it at least fails in the same scenarios, as those in which Aurelia itself would fail. However, in doing so, it might cause the failure in Aurelia to be different from what it would otherwise be. I'm also not confident that consistency can be acheived, as I'd be concerned about things such as this line, which I think might actually suffer from some of the same issues/bugs we are discussing here:

https://github.com/aurelia/templating/blob/f57cff98ee7ce0d06064d73de7601f7af8aaf3d1/src/view-compiler.js#L80

There's no guarantee the parser in the browser behaves the same as htmlparser2 when presented with invalid syntax, and thus also no guarantee that the bugs will be the same.

I'd also argue that while consistency is nice, trying to reuse the Aurelia parser for this is probably not worth the effort. We really only need to deal with a tiny subset of the syntax here, and it's pretty simple to implement a minimal parser for that, as illustrated in gulp-translate:

https://github.com/thomas-darling/gulp-translate/blob/master/src/core/template-language/implementations/aurelia/aurelia-template-language.ts

And just to drive the point home, here's the complete solution to this issue, using that parser:

import { AureliaTemplateLanguage } from "./aurelia-template-language";
import * as htmlmin from "htmlmin";

const templateLanguage = new AureliaTemplateLanguage();

export function minify(template)
{
    const expressions = [];
    const standardHtml = templateLanguage.toStandardHtml(template, expressions);
    const minifiedHtml = htmlmin(standardHtml);
    const minifiedTemplate = templateLanguage.toTemplateHtml(minifiedHtml, expressions);

    return minifiedTemplate;
}

It's that simple, so can we please just use this now? :-)

from bundler.

JeroenVinke avatar JeroenVinke commented on June 18, 2024

@thomas-darling I appreciate all the effort and thinking you've put into this. Please understand that you've been thinking about this problem a lot longer than @ahmedshuhel and me, and that we need time to get on the same page that you're on.

I'd like to add a piece of information to this discussion, which is that webpack has the same issue:

image

That's with this webpack config. Html-loader uses html-minifier internally.

Question is, how are we going to solve this problem there? I don't think we can transform HTML when a loader takes care of minification.

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

@JeroenVinke Here is one possible solution that we may consider besides @thomas-darling 's proposal.

  1. html encode all "sensitive" symbols (e.g. <, > , = etc) inside template literals (${...})
  2. Pass it to minifier and get the minified version
  3. decode the symbol back to original.

This, I reckon is safer than extract-parse-insert approach since parser in minifier will treat them as a valid text and/or attributes and won't have a chance to alter the semantics.

Thoughts? @JeroenVinke @thomas-darling

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

I don't really see how that would be any different, unfortunately.

You still need to determine where the expression begins and ends, in order to encode only the characters within the expressions - and to do that, you need to parse those expressions, exactly like I do.
What's more, you now need to parse them again after minification, as you can no longer do a simple regex replace like I do - with my solution, the extraction guarantees that all placeholders have been replaced with placeholders of a known format, which can be reliably matched using the regex /\${(\d+)}/g.

Also, the minifier might actually decide to decode those encoded characters back into single characters to save bytes, if it determines it would be safe to do so - and that may break things in some scenarios, one of which would be if I have an expression such as ${foo == '&lt;' ? "Less than" : "More than"} - if the minifier is allowed to decode that character code, then that comparison will break.
It may be an edge case, but that doesn't make it any less wrong :-)

from bundler.

JeroenVinke avatar JeroenVinke commented on June 18, 2024

How would we do this for webpack?

Also, if this is valid HTML, isn't this a problem that should be resolved in the minifier(s)?

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Aurelia templates, and templates for just about every other SPA framework, are not valid HTML - and that's perfectly ok, although it does introduce challenges when you need to parse them, as in this case.

You can't really expect a minifier to understand every templating language out there, so this is a problem we need to fix on our end.

Regarding webpack, I don't have quite enough experience with that, to say what the best solution would be for that - but you are right that it also suffers from the same issue.

from bundler.

gheoan avatar gheoan commented on June 18, 2024

One of the core advantage of Aurelia over just about every other SPA framework is its standards-compliant component model. Last time I checked Aurelia encourges template syntax that is not valid acording to WHATWG's HTML Standard. I might be wrong, but if it's true it might introduces issues with linting, minifying, 3rd party frameworks and libraries, code editors, etc.

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Yes, Aurelia templates can, like most other templating languages, cause issues with third party tools - but it's impossible to avoid that if you also want support for string literals and comparison operators inside your binding expressions.

Of course, if you want your templates to be fully compliant, then just don't use the < operator or string literals that look like HTML, such as "<foo", in your binding expressions - then you're all good.

However, those things can be very useful, and as they are part of the Aurelia binding language, they should at least work with the officially supported build tools.

To make working with third party tools easier, it might be useful to make the code for extracting and injecting the expressions available as an NPM package, and maybe a gulp plugin, used like:

gulp.src("**/*.html")
    .pipe(aureliaTemplate.toStandardHtml())
    .pipe(somePluginThatParsesHtml())
    .pipe(aureliaTemplate.toTemplateHtml())
    .pipe(dest("dist"))

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

Any news on this?

from bundler.

bigopon avatar bigopon commented on June 18, 2024

This was the latest move #166

from bundler.

ahmedshuhel avatar ahmedshuhel commented on June 18, 2024

v0.7.0 released.

from bundler.

bigopon avatar bigopon commented on June 18, 2024

Wow this is a nice solution 👍

from bundler.

thomas-darling avatar thomas-darling commented on June 18, 2024

It would make sense to add support for binding languages in htmlmin, as it is a pretty universal problem.
However, I doubt they would want to add the Aurelia-specific expression parser code to their library, so I'm guessing such support would look quite similar to the solution I've been pushing for - html would just need an option to set the ITemplateLanguage implementation to be used, and then we would plug in something similar to the gulp-translate code I've been referring to.

Do we have an issue for getting support for this into htmlmin?
I can't find it in their repository, and while wrapping things in comments may work, it is very far from ideal - those comments will likely not be added until something has already broken, and that's too late.

from bundler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.