mdevils / html-entities Goto Github PK

Fastest HTML entities encode/decode library

License: MIT License

TypeScript 100.00%

html-entities html-special-characters printable-characters ascii-characters

html-entities's Issues

Not terminated entities not supported

Hi 👋 ! I am currently trying to get a better understanding of the differences in the benchmarks between HTML entity libraries. I found a bug in html-entities along the way, hope this is useful:

Entities that are part of a string are currently not supported. Eg. &uumlber should be decoded as über. html-entities currently leaves it unchanged.

Option to prevent double encoding

Hello,

First of all thanks to everyone that has made this lib possible.

This is not a bug report but rather than a feature suggestion.

I'm using this lib to import data from a third party into an old database that only supports ISO-8859-1.

I was using it like encode(<text>, {mode: 'nonAscii'}).

But I hit an issue as it turns out that the third party already uses entities for some characters. This means that I ended up with &#39; whenever there was a ' entity, for example.

So I thought it'd be nice to have a preventDoubleEncoding option (only with a better name), to prevent encoding the ampersand whenever it's already part of an entity. E.g.:

encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: true}); -> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: false}); -> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: true}); -> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: false}); -> returns you &amp; me

Audit vulnerabilities detected in the htmlentities project on Tag: v2.3.2

Issue: We detected vulnerable dependencies in your project by using the command “npm audit”:

npm audit report

glob-parent <5.1.2
Severity: moderate
Regular expression denial of service - https://npmjs.com/advisories/1751
fix available via npm audit fix
node_modules/glob-parent

hosted-git-info <2.8.9 || >=3.0.0 <3.0.8
Severity: moderate
Regular Expression Denial of Service - https://npmjs.com/advisories/1677
fix available via npm audit fix
node_modules/hosted-git-info

lodash <4.17.21
Severity: high
Command Injection - https://npmjs.com/advisories/1673
fix available via npm audit fix
node_modules/lodash

y18n <3.2.2||=4.0.0||>=5.0.0 <5.0.5
Severity: high
Prototype Pollution - https://npmjs.com/advisories/1654
fix available via npm audit fix
node_modules/y18n

4 vulnerabilities (2 moderate, 2 high)

To address all issues, run:
npm audit fix

Questions: We are conducting a research study on vulnerable dependencies in open-source JS projects. We are curious:

Will you fix the vulnerabilities mentioned above? (Yes/No), and why?:
Do you have any additional comments? (If so, please write it down):

For any publication or research report based on this study, we will share all responses from developers in an anonymous way. Both your projects and personal information will be kept confidential.

Description: Many popular NPM packages have been found vulnerable and may carry significant risks [1]. Developers are recommended to monitor and avoid the vulnerable versions of the library. The vulnerabilities have been identified and reported by other developers, and their descriptions are available in the npm registry [2].

Steps to reproduce:

Go to the root folder of the project where the package.json file located
Execute “npm audit”
Look at the list of vulnerabilities reported

Suggested Solution: Npm has introduced the “npm audit fix” command to fix the vulnerabilities. Execute the command to apply remediation to the dependency tree.

References:
2019. 10 npm Security Best Practices. https://snyk.io/blog/ten-npm-security-best-practices/.
2021. npm-audit. https://docs.npmjs.com/cli/v7/commands/npm-audit.

html-entities doesn't decode html breaks '<br />'

Is there a way to add a custom carriage return for breaks? I'm not sure why, but the self-closed break isn't decoding. Is there a setting for this? I've checked the docs and don't have any answers. Thanks!

Invalid code point

When decoding the following string: In today&#2013266066;s digital marketplace

It results in this error:

RangeError: Invalid code point 2013266066
  at Object.fromCodePoint (<anonymous>)
  at /Users/sjlu/Code/interseller/node_modules/html-entities/lib/html5-entities.js:27:49
  at String.replace (<anonymous>)
  at Html5Entities.decode (/Users/sjlu/Code/interseller/node_modules/html-entities/lib/html5-entities.js:16:20)

node --version
v12.18.3

Comparison with exiting libraries

Hi, thanks for publishing this library!

How does it compare with @mathiasbynens's he library, which is more popular on GitHub and NPM?

How about @fb55's entities? That library wins in a performance comparison with this one.

readme on npm

The readme doesn't render too well on npm -- https://www.npmjs.org/package/html-entities

see the line with console.log(entities.encode('<>"'&©®')); -- the html entities are not output as html entities, but rendered as the same as the input

don't know if there's something you could do about this~

How to decode latin chars ?

Example:

&Aacute;
&Acirc;
&Eacute;
&Ntilde;
&Uacute;
&Ocirc;

Chars definition: http://www.madore.org/~david/computers/unicode/htmlent.html

console.log(entities.decode("&Aacute;")); //prints &Aacute;

Thanks

named-references - not available in 2.3.3 npm install?

In 2.3.3 on NPM, you will see a named-references.js file present in the (beta) package view.

When doing an npm install [email protected], it does not include the named-references.js file.

ERROR in ./node_modules/html-entities/lib/index.js 14:25-54
Module not found: Error: Can't resolve './named-references' in '\node_modules\html-entities\lib'

It seems like this file is not being included with NPM?

completely broken

Try running one of your example usage's and you will find its pretty broken man.

TypeError: Object function Html5Entities() {} has no method 'decode'

Also, why is it so weird to use, compared to all other node.js modules? Why not have it like request.js and be able to do something like:

var entities = require('thislib');
entities.decode(str);

why i need to create objects??

Can't import the named export 'decode' from non EcmaScript module (only default export is available)

I am using the html-entities package from npm and trying to import the decode method in my nodejs application. I have updated to the latest package v2.3.3

I have tried plenty of differnet methods on importing, but none seem to work, all of them give a different error:

import { decode } from 'html-entities'
let result = decode('test')

This gives the error:

Can't import the named export 'decode' from non EcmaScript module (only default export is available)

When trying with the code:

const { decode } = require('html-entities')
It gives me the error:

This file is being treated as an ES module because it has a '.js' file extension and 'D:\giveaday\Give-a-Day\backend\package.json' contains "type": "module". To treat it as a CommonJS script, rename it to use the '.cjs' file extension.

And ultimately, I tried using the old way (v1):

import htmlEntities from 'html-entities'
let result = htmlEntities.decode('test')

and then it returns the error:

ERROR html_entities__WEBPACK_IMPORTED_MODULE_4__.decode is not a function

I've also tried the solution where they say to configure webpack by adding the following code in the nuxt.config.mjs:

config.module.rules.push({
    test: /\.js$/,
    include: /node_modules/,
    type: 'javascript/auto',
  })

However, this did not work either. (the nuxt.config.mjs is located in my /frontend/ folder, while the library is being imported in my /shared/ folder)

doesn't decode any HTML entities at all

Hi, thanks for sharing this, but I couldn't get it to decode anything at all.

var Entities = require('html-entities').AllHtmlEntities, entities = new Entities(); console.log(entities.decode('Hello+%22I+think%22+%27therefore+I+am%27+%26+this+is+%7C+fun'));

Shows this in the console:
Hello+%22I+think%22+%27therefore+I+am%27+%26+this+is+%7C+fun

Running version: [email protected] node_modules/html-entities

safari 12 \ud835 error

There's an error: SyntaxError: Invalid character '\ud835' in safari 12.1.2

There is any breaking change from in new 2.x.x version?

Hi,

I've been looking through the repository but haven't found any information about it. Is it safe to upgrade from v.1.4.0 to the new version?

encode function encodes 0 to empty string

Code to reproduce:

import { encode } from 'html-entities'
console.log(encode(0))

running

import { encode } from 'html-entities'
console.log(typeof(encode(0))

gives console output string
running

import { encode } from 'html-entities'
console.log(encode(0).length)

gives console output 0

How to prevent encoding < and > like it was done in v1.4 ?

Hi,
In v1, Html5Entities didn't encode <>, in v2 it is replaced by <br />.
I have been able to tweak encode option to match v1 behavior for special character, but was not able to succeed with bracket :'(

Here is a working test case in v1:

const { Html5Entities } = require('html-entities');

describe('html5 encode with v1', () => {
  it.each`
    input                        | expectation
    ${'€'}                       | ${'&#8364;'}
    ${'’'}                       | ${'&#8217;'}
    ${'é'}                       | ${'é'}
    ${'&'}                       | ${'&'}
    ${'<br />'}                  | ${'<br />'}
    ${'<a href="toto">woot</a>'} | ${'<a href="toto">woot</a>'}
  `('should transform $input into $expectation', ({ input, expectation }) => {
    const result = Html5Entities.encodeNonASCII(input);
    expect(result).toBe(expectation);
  });
});

Here is a failing equivalent test in v2:

const { encode } = require('html-entities');

describe('html5 encode with v2', () => {
  it.each`
    input                        | expectation
    ${'€'}                       | ${'&#8364;'}                 // OK
    ${'’'}                       | ${'&#8217;'}                 // OK
    ${'é'}                       | ${'é'}                       // KO, result is &#233;
    ${'&'}                       | ${'&'}                       // KO, result is &amp;
    ${'<br />'}                  | ${'<br />'}                  // KO, result is &lt;br /&gt;
    ${'<a href="toto">woot</a>'} | ${'<a href="toto">woot</a>'} // KO, result is &lt;a href=&quot;toto&quot;&gt;woot&lt;/a&gt;
  `('should transform $input into $expectation', ({ input, expectation }) => {
    const result = encode(input, { level: 'xml', mode: 'nonAscii' });
    expect(result).toBe(expectation);
  });
});

I guess i can work around change for é but i would expect &, < or > to not be encoded as they are ASCII character and i'm encoding with the nonAscii option.
Any idea? Is this a bug or a new way to do things in this library and i would have to tackle the issue a little higher in my stack?

Module '"html-entities"' has no exported member 'encode'.

Hi, I was following the docs and encountered the following error while importing encode from html-entities:

Module '"html-entities"' has no exported member 'encode'.

After exploring the definition file index.d.ts from html-entities I found out that nor encode, and decode are exported directly.

Importing Html5Entities works:

Are the docs outdated?

Kind regards,
Firmino

Error using html-entities with rollup

Likely something wrong with my config, but this is the error I get when trying to use rollup with html-entities version "^4.1.4"

[!] RollupError: "decode" is not exported by "node_modules/html-entities/lib/index.js"

Not decoding é

with import {decode} from 'html-entities'
decode not decode é (é)
the complete string is &eacute;
Is possible to fix that

Thanks

assigning decode(someString) to something that expects a string causes es-lint typescript errors "Unsafe assignment of an any value." and "Unsafe call of an any typed value.

I haven't used many imported libraries with typescript so it is possible that I am making a fundamental typescript error around package imports

To be clear the app compiles and runs correctly, and the entities are in fact being stripped out. That said from my experience of other typed languages I know to take 'any' type seriously.

my use case is fairly simple essentially I am receiving from json a bunch of strings with html entities and working to process them to strings that do not have html entities:

import { decode } from "html-entities";

// this interface describes the json
interface Question {
  category: string;
  type: string;
  difficulty: string;
  //this is type of the string with html entitities in json
  question: string;
  correct_answer: string;
  incorrect_answers: string[];
}

//this is the converted json the app expects
export interface Card {
  category: string;
  //this is the type of the string that should receive the results of decode
  question: string;
  answer: boolean;
  hasBeenAnswered: boolean;
  chosenAnswer?: boolean;
}


const questionToCard = ({
  category,
  question,
  correct_answer,
}: Question): Card => ({
  category: category,
  // the following call to decode works but throws 2 typescript-eslint errors
  question: decode(question),
  answer: correct_answer === "True",
  hasBeenAnswered: false,
})

these are the two errors that are tthrown by typescript-eslint
https://github.com/typescript-eslint/typescript-eslint/blob/v4.16.1/packages/eslint-plugin/docs/rules/no-unsafe-assignment.md

https://github.com/typescript-eslint/typescript-eslint/blob/v4.16.1/packages/eslint-plugin/docs/rules/no-unsafe-call.md

I have read the documentation and searched elsewhere for references but I haven't seen a note about how to use this library with typescript without introducing any type into my codebase.

Have I made some error in importing or using the library? Or is this eslint error to be expected with standard use?

encodeNonASCII is actually encodeNonExtendedASCII

http://en.wikipedia.org/wiki/Extended_ASCII

would be nice to have a real encodeNonASCII as well as a encodeNonUTF8.

I will try to add this when I get a chance.

Does this package work to check the validity or check if the html is fully formed?

e.g. Something like

isValid("<div><ul><li></ul></div>"); => false
isValid("<div><ul><li></li></ul></div>"); => true

VirusTotal - 1 security vendor flagged this URL as malicious

The npm package was flagged by VirusTotal as malicious.

URL: https://registry.npmjs.org/html-entities/-/html-entities-2.3.3.tgz

https://www.virustotal.com/gui/url/cbbd8234ebcece557a5e9c8f657d94c7fbc3f21125bccf8a26634fcd6b5d14a7

360 anti-virus warns this is virus.

360 anti-virus warns this is virus.
Type
virus.js.qexvmc.1070

\frontend\node_modules\html-entities\lib\html5-entities.js

MD5:
e23b7edbddd7c994e4c67e9bdf97c4aa

converting ` ` fails jest test

surprisingly, this conversion fails. not sure why, it looks quite fine to me. 🤔 could it be the ascii character itself?

expect(decode('first second')).toEqual('first second')

the equality looks suspicious though, but that's the log on terminal

    Expected: "first second"
    Received: "first second"

macOS 10.15.7, html-entities 2.3.2, ts-jest 25.2.1

Entities is not a constructor

I got this error
Entities is not a constructor

Decoding entities from a state component - react native

I'm saving values within a react native state component and mapping through to return the value to the ui, but the values I wish to decode are not being decoded. Is there an example on how this works within react native?
const [freeMedia, setFreeMedia] = useState();

{freeMedia && freeMedia.length > 0
? freeMedia.map((uploads) => {
return(
<>

{decode(uploads.videotitle)}

</>
)
})
: null}

Please add "&NewLine;" entity

Please add &NewLine; entity.
Sorry for such short text, but I do not know how to argument it better :)

Left " and Right " Double Quotes don't get decoded

Single and double CloseCurlyQuote entities aren’t properly encoded

The problem is with ’ (i.e. ’) and ” (i.e. ”), which are encoded as &CloseCurlyQuote; and &CloseCurlyDoubleQuote;, respectively.

See this test case

UPDATE: It seems that the issue is specific to the Html5Entities class. When I repeated the test using require('html-entities').Html4Entities, the encoding worked (see updated test case above).

Unable to encode 𝌆

Hi Folks,

I'm not entirely sure what is special about this character, but I have been trying to encode any character so that it can be stored in a non-UTF DB for a legacy system and this one doesn't seem to encode. I'm sure we'll not use this specific character, but I'm worried that a range of characters may not be picked up.

I have tried both nonAsciiPrintable and nonAscii modes. Am I doing something wrong?

Thanks for your help
Ash

PS. It also didn't encode with he

Should XmlEntities.encodeNonUTF also replace "&" with "&"?

It seems it's doing it "behind my back". I am already encoding the illegal XML characters myself and wanted to just make sure we don't end up with non-UTF characters. & seems pretty UTF to me 😄

HTML chars without ending ; still decoded

The .decode() function is too aggressive as it decodes even incomplete HTML characters like in this example:

var url = 'http://some-url.de/?param1=value1&lang=en';
console.log(require('html-entities').AllHtmlEntities.decode(url));

yields

http://some-url.de/?param1=value1⟨=en

This means, &lang (probably a not so rare URL parameter) is decoded as if it was &lang;, the character ⟨.

bug of the function val,html and text to parse content of html tag

Test case as fllow:

const cheerio = require('cheerio');
const fs = require('fs');
const Entities = require('html-entities').XmlEntities;
const entities = new Entities();
let $ = cheerio.load(fs.readFileSync('./test.html').toString());
/**
----file: test.html ----

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    <textarea style="display:none">{"bug":"景点<1km"}</textarea>
</body>
</html>

**/
let html = $('body > textarea').html();
let text = $('body > textarea').text();
let val = $('body > textarea').val();
// I want test {"bug":"景点<1km"}, but it return: {"bug":"
console.log(html);
console.log(text);
console.log(val);
console.log(entities.decode(html));

I can't get what i want :

 {"bug":"景点<1km"}

I think the char "<" should not be processed in a string

JSON.parse may be faster for namedReferences object

Not sure if this still holds up, but in 2019 it was apparently the case that JSON.parse would be faster for objects >10kb in size: https://v8.dev/blog/cost-of-javascript-2019

typo in XmlEntities.prototype.encodeNonASCII

s/lenght/length/g

webpack issue 'require function is used in a way in which dependencies cannot be statically extracted'

Hi,

I'm using html-entities in a ui application, generated by webpack. when I update from 1.3.1 to 2.0.2, webpack complains with the error message:

WARNING in ./node_modules/html-entities/lib/index.js 13:93-94
Critical dependency: require function is used in a way in which dependencies cannot be statically extracted
 @ ./example/src/index.jsx 1:182-216

can you help on this issue?

Regards
stheine

ES6 module syntax in docs

Docs could use a code sample with ES6 module syntax. I am happy to submit a PR with this if you'd like to add it.

Incorrect Performance comparison

See fb55/entities#377 (comment)

Here is what I see with this project's benchmark using current versions:

Common

    Initialization / Load speed

        #1: html-entities: 
        #2: entities x 4,276,293 ops/sec ±2.67% (81 runs sampled)
        #3: he x 3,099,406 ops/sec ±2.48% (87 runs sampled)

HTML5

    Encode test

        #1: entities.encodeNonAsciiHTML x 923,610 ops/sec ±0.12% (96 runs sampled)
        #2: html-entities.encode - html5, nonAscii x 678,397 ops/sec ±0.07% (97 runs sampled)
        #3: html-entities.encode - html5, nonAsciiPrintable x 631,622 ops/sec ±3.44% (96 runs sampled)
        #4: he.encode x 165,394 ops/sec ±0.09% (98 runs sampled)

    Decode test

        #1: entities.decodeHTML x 1,186,228 ops/sec ±0.15% (100 runs sampled)
        #2: entities.decodeHTMLStrict x 1,154,976 ops/sec ±0.16% (99 runs sampled)
        #3: html-entities.decode - html5, strict x 551,112 ops/sec ±0.11% (101 runs sampled)
        #4: html-entities.decode - html5, body x 533,743 ops/sec ±0.19% (100 runs sampled)
        #5: html-entities.decode - html5, attribute x 531,052 ops/sec ±2.71% (97 runs sampled)
        #6: he.decode x 320,768 ops/sec ±0.19% (99 runs sampled)

HTML4

    Encode test

        #1: entities.encodeNonAsciiHTML x 883,385 ops/sec ±0.21% (98 runs sampled)
        #2: html-entities.encode - html4, nonAscii x 604,980 ops/sec ±2.90% (95 runs sampled)
        #3: html-entities.encode - html4, nonAsciiPrintable x 584,515 ops/sec ±0.25% (99 runs sampled)

    Decode test

        #1: entities.decodeHTML x 1,357,417 ops/sec ±0.15% (100 runs sampled)
        #2: entities.decodeHTMLStrict x 1,326,669 ops/sec ±0.17% (98 runs sampled)
        #3: html-entities.decode - html4, strict x 572,823 ops/sec ±0.16% (96 runs sampled)
        #4: html-entities.decode - html4, body x 563,574 ops/sec ±0.12% (99 runs sampled)
        #5: html-entities.decode - html4, attribute x 553,911 ops/sec ±2.58% (94 runs sampled)

XML

    Encode test

        #1: entities.encodeXML x 949,544 ops/sec ±0.56% (97 runs sampled)
        #2: html-entities.encode - xml, nonAscii x 771,887 ops/sec ±0.17% (97 runs sampled)
        #3: html-entities.encode - xml, nonAsciiPrintable x 711,548 ops/sec ±3.61% (94 runs sampled)

    Decode test

        #1: entities.decodeXML x 1,608,430 ops/sec ±0.14% (100 runs sampled)
        #2: html-entities.decode - xml, body x 700,270 ops/sec ±0.17% (98 runs sampled)
        #3: html-entities.decode - xml, strict x 696,913 ops/sec ±0.14% (100 runs sampled)
        #4: html-entities.decode - xml, attribute x 689,648 ops/sec ±0.14% (101 runs sampled)

Escaping

    Escape test

        #1: entities.escapeUTF8 x 2,962,869 ops/sec ±0.14% (98 runs sampled)
        #2: he.escape x 1,943,677 ops/sec ±0.19% (97 runs sampled)
        #3: html-entities.encode - xml, specialChars x 1,854,260 ops/sec ±0.40% (99 runs sampled)
        #4: entities.escape x 938,272 ops/sec ±0.15% (97 runs sampled)

Formatting Error

Code

var cart_product_name = '&#34Royal OG&#34 (Private Reserve) NEW'
cart_product_name = require('html-entities').AllHtmlEntities.decode(cart_product_name)

INPUT: &#34Royal OG&#34 (Private Reserve) NEW'
OUTPUT: '' OG" (Private Reserve) NEW'

Why is "Royal" being completely wiped out?

Add map file?

Hi,

Thanks for creating this awesome library. I was wondering if it was possible to add a map file for this repo.

The "timesbar" entity is not decoded correctly

There is a decoding bug that impacts the ⨱ (timesbar) entity

We have the following string

Kun&timesbar;alas Heritage Site/Conservancy

It should be decoded as "Kun⨱alas Heritage Site/Conservancy" but instead it gets decoded as "Kun×bar;alas Heritage Site/Conservancy"

CodePoints encoding and supplementary plane support

IMHO cow shell be encoded as 🐄
however it is not, is this supported ? If my assumption is wrong please correct me.

see test case
$ node html/html-encode
[ '🐄', '🐄', '🐄' ]
[ '🐄', '🐄', '🐄' ]
[ '��', '��', '��' ]
[ '��', '��', '��' ]
Assertion failed

var entities = new(require('html-entities').AllHtmlEntities)();
let cows=[
"🐄",
"\u{1f404}",
"\uD83D\uDC04"];

console.log(cows);
console.log(cows.map((c)=>entities.encode(c)));
console.log(cows.map((c)=>entities.encodeNonASCII(c)));
console.log(cows.map((c)=>entities.encodeNonUTF(c)));

console.assert(entities.encodeNonUTF(cows[1])==="🐄")

http://www.amp-what.com/unicode/search/cow

Rewrite decoder

The decoder has multiple issues that must be addressed

the code is inconsistent
typo with Æ (PR #37)
semi-colon shouldn't be always optional
- this makes this package unusable in many applications (#19)
- semi-colon is only optional for HTML 3.2 entities. Semi-colon is required for entities added afterward
- the regEx is also broken. The decoder would fail with "&ampsomething"
Html4Entities doesn't decode "&QUOT;><" but Html5Entities does. We should accept those special cases in quirk mode
doesn't accept numeric character reference starting with "&#X" even though "&#X" and "&#x" are both valid in HTML 4 and HTML 5 specification.

I am doing a complete rewrite of the decoder. The decoder would use incremental parser instead of regEx, and it would have a quirk mode and a strict mode.

Possible optimization

https://github.com/mdevils/node-html-entities/blob/dc08bde42ee6468d60ca617061b0b37b2edc45ca/lib/html5-entities.js#L166

Couldn't you put the output of this in a JSON file? As far as I can see this information is static, so shouldn't have to be calculated each time.

If you think it's a good idea I can get a PR together.

How to migrate from v1.x to v2.x

Hi!

I just updated the package but now Im getting the error "Entities is not a constructor".

In the version 1.x, the readMe said to declare the funcion like this:

const entities = new Entities();

But now in the newest version that constructor i not available, how can I upgrade the package? thanks!

It seems that the current npm version (1.0.10) of the module doesn't decode the &quote; symbol to ".

Regards,
Nikolay Tsenkov

mdevils / html-entities Goto Github PK

html-entities's Issues

npm audit report

I can't get what i want :

I think the char "<" should not be processed in a string

Code

Recommend Projects

Recommend Topics

Recommend Org