Git Product home page Git Product logo

clean-html's Issues

CSS and JavaScript wrapping

Example input:

    <script type="text/javascript">
        console.log('I prefer the concrete, the graspable, the proveable.');
    </script>

Expected output:

    <script type="text/javascript">
        console.log('I prefer the concrete, the graspable, the proveable.');
    </script>

Actual output:

    <script type="text/javascript">
              console.log('I prefer the
      concrete, the graspable, the
      proveable.');
        </script>

The same issue applies to both style and script tags. Neither should be wrapped.

Modernize the library (with TypeScript)

Current problems

This is a great library. However, it still has some downsides:

  • I cannot really use it in a modern TypeScript project without TypeScript yelling at me because it has no types at all.
  • It still uses CommonJS.
  • It uses a very outdated version of htmlparser2.

Suggested changes

  • Modernizing the codebase using Vite (with TypeScript support)
  • Supporting CommonJS/UMD/ESM altogether.
  • Upgrading htmlparser2 to the newest version.

@dave-kennedy What do you think of these problems and suggested changes?

Conditional comments not handled

Example input:

<!DOCTYPE html>
<!--[if IEMobile 7 ]> <html lang="en_US" class="no-js iem7"> <![endif]-->
<!--[if lt IE 7]> <html class="ie6 lt-ie10 lt-ie9 lt-ie8 lt-ie7 no-js" lang="en_US"> <![endif]-->
<!--[if IE 7]>    <html class="ie7 lt-ie10 lt-ie9 lt-ie8 no-js" lang="en_US"> <![endif]-->
<!--[if IE 8]>    <html class="ie8 lt-ie10 lt-ie9 no-js" lang="en_US"> <![endif]-->
<!--[if IE 9]>    <html class="ie9 lt-ie10 no-js" lang="en_US"> <![endif]-->

Expected output:

<!DOCTYPE html>
  <!--[if IEMobile 7 ]> <html lang="en_US" class="no-js iem7"> <![endif]-->
  <!--[if lt IE 7]> <html class="ie6 lt-ie10 lt-ie9 lt-ie8 lt-ie7 no-js" lang="en_US"> <![endif]-->
  <!--[if IE 7]> <html class="ie7 lt-ie10 lt-ie9 lt-ie8 no-js" lang="en_US"> <![endif]-->
  <!--[if IE 8]> <html class="ie8 lt-ie10 lt-ie9 no-js" lang="en_US"> <![endif]-->
  <!--[if IE 9]> <html class="ie9 lt-ie10 no-js" lang="en_US"> <![endif]-->

Actual output:

<!DOCTYPE html>
  <!--[if IEMobile 7 ]> <html lang="en_US" class="no-js iem7"> <![endif]-->
    <!--[if lt IE 7]> <html class="ie6 lt-ie10 lt-ie9 lt-ie8 lt-ie7 no-js" lang="en_US"> <![endif]-->
      <!--[if IE 7]> <html class="ie7 lt-ie10 lt-ie9 lt-ie8 no-js" lang="en_US"> <![endif]-->
        <!--[if IE 8]> <html class="ie8 lt-ie10 lt-ie9 no-js" lang="en_US"> <![endif]-->
          <!--[if IE 9]> <html class="ie9 lt-ie10 no-js" lang="en_US"> <![endif]-->

Because the html tag is only closed once, there's a bunch of extra indentation at the end of the file.

img tag gets removed when using remove-empty-tags

When I use the remove-empty-tags option with any tag, img always gets removed.

<p><img alt="" src="https://github.githubassets.com/images/modules/dashboard/copilot/bg.jpg"></p>

It's regardless of self closing or not, if I use even picture instead of img. They get always removed. Seems like a whitelist is processed under the hood!?

Any idea @dave-kennedy ?

Multiline comment wrapping

Example input:

    <!--
        I prefer the concrete, the graspable, the proveable.
    -->

Expected output (when wrap = 40):

    <!-- I prefer the concrete, the
        graspable, the proveable. -->

Actual output:

    <!--
          I prefer the concrete, the
    graspable, the proveable.
        -->

why we remove script or style tag?

which is really import to me...

import $ from 'jquery'
import { clean } from 'clean-html'

const $code = $('code.api-preview')
const badHtml = $code.text()
const opts = {
    'add-break-around-tags': ['figure', 'iframe', 'a', 'img', 'time', 'address', 'details', 'script', 'ul', 'ol', 'li'],
    'wrap': 0
}

clean(badHtml, opts, (clearHtml) => {
    $('code.api-preview').text(clearHtml)
        .css('font-size', '14px')
        .css('font-family', 'Hack, "DejaVu Sans Mono", Menlo, Consolas, "Liberation Mono", Monaco, "Lucida Console", monospace')
        .css('line-height', '20px')
})

now i need to replace style and style tag into ruby tag, and replace it back after cleaning.
INTO this:

const $code = $('code')
const badHtml = $code.text()
    .replace('<script', '<ruby')
    .replace('\/script>', '\/ruby>')
    .replace('<style', '<dialog')
    .replace('\/style>', '\/dialog>')

const opts = {
    'add-break-around-tags': [
        'figure', 'iframe', 'a', 'img', 'time', 'address', 'details', 'script', 'ul', 'ol', 'li', 'ruby', 'dialog'
    ],
    'wrap': 0
}

clean(badHtml, opts, (clearHtml) => {
    let newClearHtml = clearHtml.replace('<ruby', '<script')
        .replace('\/ruby>', '\/script>')
        .replace('<dialog', '<style')
        .replace('\/dialog>', '\/style>')

    $('code.api-preview').text(newClearHtml)
        .css('font-size', '14px')
        .css('font-family', 'Hack, "DejaVu Sans Mono", Menlo, Consolas, "Liberation Mono", Monaco, "Lucida Console", monospace')
        .css('line-height', '20px')
})

would we made it remove-able by configure?

working with patterns

Hi,

the library is cool, I just need it to be a bit more functional. From what I understand currently it does not provide the ability to remove-attributes / remove-tags etc. by a pattern, so e.g. I want to remove all tags that match a pattern /abc-[a-z0-9]+/i.

Is it possible to have a feature like that?

Maximum call stack exceeded

Thanks for the excellent tool. I find when I paste more than 2 dozen or so lines, I get a maximum callstack exceeded error. Any thoughts of fixing this?

Thanks again!

Return value

I noticed the main clean function in index.js has no return value, this makes it very difficult to compose in a chain of operations, i tried to get my project going for hours until i realized this was happening and i solved it by simply adding on return html onto the end of the function clean in index.js. Is there any reason that this would break the program? I would like to submit this as a pull request but not sure if there is any compatibility issues. I think not.

No hanging indent

Example input:

    <meta property="og:description" content="DuckDuckGo is the search engine that doesn't track you. We protect your search history from everyone โ€“ even us!">

Expected output:

    <meta property="og:description" content="DuckDuckGo is the search engine that doesn't track you. We protect your
        search history from everyone โ€“ even us!">

Actual output:

    <meta property="og:description" content="DuckDuckGo is the search engine that doesn't track you. We protect your
    search history from everyone โ€“ even us!">

<script> get stripped from <head>

I have no idea why that made sense, but I don't see any way to change that, which is a shame because as much as I love prettier, it takes 3 seconds to run on content that this library needs <0.1s for, so I'd much rather use this.

Remove pre/post whitespace within style attributes

Currently, clean-html transforms styles with whitespace into a clean wrapped set of lines, e.g.

<div style="
        font-family: -apple-system, system-ui, BlinkMacSystemFont, 'Segoe UI', Roboto;
        height: 100dvh;
        display: flex;
        flex-direction: column; 
        align-items: center;
        justify-content: center;
        color: #435d7b;
        background-color: #d0b641;
        gap: 20px;
      ">...</div>

to:

<div style=" font-family: -apple-system, system-ui, BlinkMacSystemFont, 'Segoe UI', Roboto; height: 100dvh; display:
        flex; flex-direction: column; align-items: center; justify-content: center; color: #435d7b; background-color:
        #d0b641; gap: 20px; ">
...
</div>

However, it leaves a space before the first style and after the last style. Would it be possible to strip these, i.e.:

<div style="font-family: -apple-system, system-ui, BlinkMacSystemFont, 'Segoe UI', Roboto; height: 100dvh; display:
        flex; flex-direction: column; align-items: center; justify-content: center; color: #435d7b; background-color:
        #d0b641; gap: 20px;">
...
</div>

Allow more than one input file

Hello!
You have an excellent project!
I'm writing to you to ask if I can apply clean-html not to one file, but to all at once? And if the answer is yes, which command should I use to do this? I already tried clean-html **/*.html --in-place but it didn't work.
I am looking forward to your reply.
Sincerely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.