mdevils / html-entities Goto Github PK
View Code? Open in Web Editor NEWFastest HTML entities encode/decode library
License: MIT License
Fastest HTML entities encode/decode library
License: MIT License
Hi 👋 ! I am currently trying to get a better understanding of the differences in the benchmarks between HTML entity libraries. I found a bug in html-entities
along the way, hope this is useful:
Entities that are part of a string are currently not supported. Eg. über
should be decoded as über
. html-entities
currently leaves it unchanged.
Hello,
First of all thanks to everyone that has made this lib possible.
This is not a bug report but rather than a feature suggestion.
I'm using this lib to import data from a third party into an old database that only supports ISO-8859-1.
I was using it like encode(<text>, {mode: 'nonAscii'})
.
But I hit an issue as it turns out that the third party already uses entities for some characters. This means that I ended up with &#39;
whenever there was a '
entity, for example.
So I thought it'd be nice to have a preventDoubleEncoding
option (only with a better name), to prevent encoding the ampersand whenever it's already part of an entity. E.g.:
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: true});
-> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: false});
-> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: true});
-> returns you & me
encode('you & me', {mode: 'nonAscii', preventDoubleEncoding: false});
-> returns you &amp; me
Issue: We detected vulnerable dependencies in your project by using the command “npm audit”:
glob-parent <5.1.2
Severity: moderate
Regular expression denial of service - https://npmjs.com/advisories/1751
fix available via npm audit fix
node_modules/glob-parent
hosted-git-info <2.8.9 || >=3.0.0 <3.0.8
Severity: moderate
Regular Expression Denial of Service - https://npmjs.com/advisories/1677
fix available via npm audit fix
node_modules/hosted-git-info
lodash <4.17.21
Severity: high
Command Injection - https://npmjs.com/advisories/1673
fix available via npm audit fix
node_modules/lodash
y18n <3.2.2||=4.0.0||>=5.0.0 <5.0.5
Severity: high
Prototype Pollution - https://npmjs.com/advisories/1654
fix available via npm audit fix
node_modules/y18n
4 vulnerabilities (2 moderate, 2 high)
To address all issues, run:
npm audit fix
Questions: We are conducting a research study on vulnerable dependencies in open-source JS projects. We are curious:
For any publication or research report based on this study, we will share all responses from developers in an anonymous way. Both your projects and personal information will be kept confidential.
Description: Many popular NPM packages have been found vulnerable and may carry significant risks [1]. Developers are recommended to monitor and avoid the vulnerable versions of the library. The vulnerabilities have been identified and reported by other developers, and their descriptions are available in the npm registry [2].
Steps to reproduce:
Suggested Solution: Npm has introduced the “npm audit fix” command to fix the vulnerabilities. Execute the command to apply remediation to the dependency tree.
References:
2019. 10 npm Security Best Practices. https://snyk.io/blog/ten-npm-security-best-practices/.
2021. npm-audit. https://docs.npmjs.com/cli/v7/commands/npm-audit.
Is there a way to add a custom carriage return for breaks? I'm not sure why, but the self-closed break isn't decoding. Is there a setting for this? I've checked the docs and don't have any answers. Thanks!
When decoding the following string: In today�s digital marketplace
It results in this error:
RangeError: Invalid code point 2013266066
at Object.fromCodePoint (<anonymous>)
at /Users/sjlu/Code/interseller/node_modules/html-entities/lib/html5-entities.js:27:49
at String.replace (<anonymous>)
at Html5Entities.decode (/Users/sjlu/Code/interseller/node_modules/html-entities/lib/html5-entities.js:16:20)
node --version
v12.18.3
Hi, thanks for publishing this library!
How does it compare with @mathiasbynens's he library, which is more popular on GitHub and NPM?
How about @fb55's entities
? That library wins in a performance comparison with this one.
The readme doesn't render too well on npm -- https://www.npmjs.org/package/html-entities
see the line with console.log(entities.encode('<>"'&©®')); -- the html entities are not output as html entities, but rendered as the same as the input
don't know if there's something you could do about this~
Example:
Á
Â
É
Ñ
Ú
Ô
Chars definition: http://www.madore.org/~david/computers/unicode/htmlent.html
console.log(entities.decode("Á")); //prints Á
Thanks
In 2.3.3 on NPM, you will see a named-references.js file present in the (beta) package view.
When doing an npm install [email protected], it does not include the named-references.js file.
ERROR in ./node_modules/html-entities/lib/index.js 14:25-54
Module not found: Error: Can't resolve './named-references' in '\node_modules\html-entities\lib'
It seems like this file is not being included with NPM?
Try running one of your example usage's and you will find its pretty broken man.
TypeError: Object function Html5Entities() {} has no method 'decode'
Also, why is it so weird to use, compared to all other node.js modules? Why not have it like request.js and be able to do something like:
var entities = require('thislib');
entities.decode(str);
why i need to create objects??
I am using the html-entities package from npm and trying to import the decode method in my nodejs application. I have updated to the latest package v2.3.3
I have tried plenty of differnet methods on importing, but none seem to work, all of them give a different error:
import { decode } from 'html-entities'
let result = decode('test')
This gives the error:
Can't import the named export 'decode' from non EcmaScript module (only default export is available)
When trying with the code:
const { decode } = require('html-entities')
It gives me the error:
This file is being treated as an ES module because it has a '.js' file extension and 'D:\giveaday\Give-a-Day\backend\package.json' contains "type": "module". To treat it as a CommonJS script, rename it to use the '.cjs' file extension.
And ultimately, I tried using the old way (v1):
import htmlEntities from 'html-entities'
let result = htmlEntities.decode('test')
and then it returns the error:
ERROR html_entities__WEBPACK_IMPORTED_MODULE_4__.decode is not a function
I've also tried the solution where they say to configure webpack by adding the following code in the nuxt.config.mjs:
config.module.rules.push({
test: /\.js$/,
include: /node_modules/,
type: 'javascript/auto',
})
However, this did not work either. (the nuxt.config.mjs is located in my /frontend/ folder, while the library is being imported in my /shared/ folder)
Hi, thanks for sharing this, but I couldn't get it to decode anything at all.
var Entities = require('html-entities').AllHtmlEntities, entities = new Entities(); console.log(entities.decode('Hello+%22I+think%22+%27therefore+I+am%27+%26+this+is+%7C+fun'));
Shows this in the console:
Hello+%22I+think%22+%27therefore+I+am%27+%26+this+is+%7C+fun
Running version: [email protected] node_modules/html-entities
There's an error: SyntaxError: Invalid character '\ud835'
in safari 12.1.2
Hi,
I've been looking through the repository but haven't found any information about it. Is it safe to upgrade from v.1.4.0 to the new version?
Code to reproduce:
import { encode } from 'html-entities'
console.log(encode(0))
running
import { encode } from 'html-entities'
console.log(typeof(encode(0))
gives console output string
running
import { encode } from 'html-entities'
console.log(encode(0).length)
gives console output 0
Hi,
In v1, Html5Entities didn't encode <>
, in v2 it is replaced by <br />
.
I have been able to tweak encode option to match v1 behavior for special character, but was not able to succeed with bracket :'(
Here is a working test case in v1:
const { Html5Entities } = require('html-entities');
describe('html5 encode with v1', () => {
it.each`
input | expectation
${'€'} | ${'€'}
${'’'} | ${'’'}
${'é'} | ${'é'}
${'&'} | ${'&'}
${'<br />'} | ${'<br />'}
${'<a href="toto">woot</a>'} | ${'<a href="toto">woot</a>'}
`('should transform $input into $expectation', ({ input, expectation }) => {
const result = Html5Entities.encodeNonASCII(input);
expect(result).toBe(expectation);
});
});
Here is a failing equivalent test in v2:
const { encode } = require('html-entities');
describe('html5 encode with v2', () => {
it.each`
input | expectation
${'€'} | ${'€'} // OK
${'’'} | ${'’'} // OK
${'é'} | ${'é'} // KO, result is é
${'&'} | ${'&'} // KO, result is &
${'<br />'} | ${'<br />'} // KO, result is <br />
${'<a href="toto">woot</a>'} | ${'<a href="toto">woot</a>'} // KO, result is <a href="toto">woot</a>
`('should transform $input into $expectation', ({ input, expectation }) => {
const result = encode(input, { level: 'xml', mode: 'nonAscii' });
expect(result).toBe(expectation);
});
});
I guess i can work around change for é
but i would expect &
, <
or >
to not be encoded as they are ASCII character and i'm encoding with the nonAscii
option.
Any idea? Is this a bug or a new way to do things in this library and i would have to tackle the issue a little higher in my stack?
Hi, I was following the docs and encountered the following error while importing encode
from html-entities
:
Module '"html-entities"' has no exported member 'encode'.
After exploring the definition file index.d.ts
from html-entities
I found out that nor encode
, and decode
are exported directly.
Importing Html5Entities
works:
Are the docs outdated?
Kind regards,
Firmino
Likely something wrong with my config, but this is the error I get when trying to use rollup with html-entities version "^4.1.4"
[!] RollupError: "decode" is not exported by "node_modules/html-entities/lib/index.js"
with import {decode} from 'html-entities'
decode
not decode é
(é)
the complete string is &eacute;
Is possible to fix that
Thanks
I haven't used many imported libraries with typescript so it is possible that I am making a fundamental typescript error around package imports
To be clear the app compiles and runs correctly, and the entities are in fact being stripped out. That said from my experience of other typed languages I know to take 'any' type seriously.
my use case is fairly simple essentially I am receiving from json a bunch of strings with html entities and working to process them to strings that do not have html entities:
import { decode } from "html-entities";
// this interface describes the json
interface Question {
category: string;
type: string;
difficulty: string;
//this is type of the string with html entitities in json
question: string;
correct_answer: string;
incorrect_answers: string[];
}
//this is the converted json the app expects
export interface Card {
category: string;
//this is the type of the string that should receive the results of decode
question: string;
answer: boolean;
hasBeenAnswered: boolean;
chosenAnswer?: boolean;
}
const questionToCard = ({
category,
question,
correct_answer,
}: Question): Card => ({
category: category,
// the following call to decode works but throws 2 typescript-eslint errors
question: decode(question),
answer: correct_answer === "True",
hasBeenAnswered: false,
})
these are the two errors that are tthrown by typescript-eslint
https://github.com/typescript-eslint/typescript-eslint/blob/v4.16.1/packages/eslint-plugin/docs/rules/no-unsafe-assignment.md
I have read the documentation and searched elsewhere for references but I haven't seen a note about how to use this library with typescript without introducing any type into my codebase.
Have I made some error in importing or using the library? Or is this eslint error to be expected with standard use?
http://en.wikipedia.org/wiki/Extended_ASCII
would be nice to have a real encodeNonASCII as well as a encodeNonUTF8.
I will try to add this when I get a chance.
e.g. Something like
isValid("<div><ul><li></ul></div>"); => false
isValid("<div><ul><li></li></ul></div>"); => true
The npm package was flagged by VirusTotal as malicious.
URL: https://registry.npmjs.org/html-entities/-/html-entities-2.3.3.tgz
https://www.virustotal.com/gui/url/cbbd8234ebcece557a5e9c8f657d94c7fbc3f21125bccf8a26634fcd6b5d14a7
360 anti-virus warns this is virus.
Type
virus.js.qexvmc.1070
\frontend\node_modules\html-entities\lib\html5-entities.js
MD5:
e23b7edbddd7c994e4c67e9bdf97c4aa
surprisingly, this conversion fails. not sure why, it looks quite fine to me. 🤔 could it be the ascii character itself?
expect(decode('first second')).toEqual('first second')
the equality looks suspicious though, but that's the log on terminal
Expected: "first second"
Received: "first second"
macOS 10.15.7, html-entities 2.3.2, ts-jest 25.2.1
I got this error
Entities is not a constructor
I'm saving values within a react native state component and mapping through to return the value to the ui, but the values I wish to decode are not being decoded. Is there an example on how this works within react native?
const [freeMedia, setFreeMedia] = useState();
{freeMedia && freeMedia.length > 0
? freeMedia.map((uploads) => {
return(
<>
{decode(uploads.videotitle)}
</>
)
})
: null}
Please add 

entity.
Sorry for such short text, but I do not know how to argument it better :)
The problem is with ’
(i.e. ’
) and ”
(i.e. ”
), which are encoded as ’
and ”
, respectively.
See this test case
UPDATE: It seems that the issue is specific to the Html5Entities
class. When I repeated the test using require('html-entities').Html4Entities
, the encoding worked (see updated test case above).
Hi Folks,
I'm not entirely sure what is special about this character, but I have been trying to encode any character so that it can be stored in a non-UTF DB for a legacy system and this one doesn't seem to encode. I'm sure we'll not use this specific character, but I'm worried that a range of characters may not be picked up.
I have tried both nonAsciiPrintable and nonAscii modes. Am I doing something wrong?
Thanks for your help
Ash
PS. It also didn't encode with he
It seems it's doing it "behind my back". I am already encoding the illegal XML characters myself and wanted to just make sure we don't end up with non-UTF characters. & seems pretty UTF to me 😄
The .decode()
function is too aggressive as it decodes even incomplete HTML characters like in this example:
var url = 'http://some-url.de/?param1=value1&lang=en';
console.log(require('html-entities').AllHtmlEntities.decode(url));
yields
http://some-url.de/?param1=value1⟨=en
This means, &lang
(probably a not so rare URL parameter) is decoded as if it was ⟨
, the character ⟨
.
Test case as fllow:
const cheerio = require('cheerio');
const fs = require('fs');
const Entities = require('html-entities').XmlEntities;
const entities = new Entities();
let $ = cheerio.load(fs.readFileSync('./test.html').toString());
/**
----file: test.html ----
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<textarea style="display:none">{"bug":"景点<1km"}</textarea>
</body>
</html>
**/
let html = $('body > textarea').html();
let text = $('body > textarea').text();
let val = $('body > textarea').val();
// I want test {"bug":"景点<1km"}, but it return: {"bug":"
console.log(html);
console.log(text);
console.log(val);
console.log(entities.decode(html));
{"bug":"景点<1km"}
Not sure if this still holds up, but in 2019 it was apparently the case that JSON.parse would be faster for objects >10kb in size: https://v8.dev/blog/cost-of-javascript-2019
s/lenght/length/g
Hi,
I'm using html-entities
in a ui application, generated by webpack. when I update from 1.3.1 to 2.0.2, webpack complains with the error message:
WARNING in ./node_modules/html-entities/lib/index.js 13:93-94
Critical dependency: require function is used in a way in which dependencies cannot be statically extracted
@ ./example/src/index.jsx 1:182-216
can you help on this issue?
Regards
stheine
Docs could use a code sample with ES6 module syntax. I am happy to submit a PR with this if you'd like to add it.
See fb55/entities#377 (comment)
Here is what I see with this project's benchmark using current versions:
Common
Initialization / Load speed
#1: html-entities:
#2: entities x 4,276,293 ops/sec ±2.67% (81 runs sampled)
#3: he x 3,099,406 ops/sec ±2.48% (87 runs sampled)
HTML5
Encode test
#1: entities.encodeNonAsciiHTML x 923,610 ops/sec ±0.12% (96 runs sampled)
#2: html-entities.encode - html5, nonAscii x 678,397 ops/sec ±0.07% (97 runs sampled)
#3: html-entities.encode - html5, nonAsciiPrintable x 631,622 ops/sec ±3.44% (96 runs sampled)
#4: he.encode x 165,394 ops/sec ±0.09% (98 runs sampled)
Decode test
#1: entities.decodeHTML x 1,186,228 ops/sec ±0.15% (100 runs sampled)
#2: entities.decodeHTMLStrict x 1,154,976 ops/sec ±0.16% (99 runs sampled)
#3: html-entities.decode - html5, strict x 551,112 ops/sec ±0.11% (101 runs sampled)
#4: html-entities.decode - html5, body x 533,743 ops/sec ±0.19% (100 runs sampled)
#5: html-entities.decode - html5, attribute x 531,052 ops/sec ±2.71% (97 runs sampled)
#6: he.decode x 320,768 ops/sec ±0.19% (99 runs sampled)
HTML4
Encode test
#1: entities.encodeNonAsciiHTML x 883,385 ops/sec ±0.21% (98 runs sampled)
#2: html-entities.encode - html4, nonAscii x 604,980 ops/sec ±2.90% (95 runs sampled)
#3: html-entities.encode - html4, nonAsciiPrintable x 584,515 ops/sec ±0.25% (99 runs sampled)
Decode test
#1: entities.decodeHTML x 1,357,417 ops/sec ±0.15% (100 runs sampled)
#2: entities.decodeHTMLStrict x 1,326,669 ops/sec ±0.17% (98 runs sampled)
#3: html-entities.decode - html4, strict x 572,823 ops/sec ±0.16% (96 runs sampled)
#4: html-entities.decode - html4, body x 563,574 ops/sec ±0.12% (99 runs sampled)
#5: html-entities.decode - html4, attribute x 553,911 ops/sec ±2.58% (94 runs sampled)
XML
Encode test
#1: entities.encodeXML x 949,544 ops/sec ±0.56% (97 runs sampled)
#2: html-entities.encode - xml, nonAscii x 771,887 ops/sec ±0.17% (97 runs sampled)
#3: html-entities.encode - xml, nonAsciiPrintable x 711,548 ops/sec ±3.61% (94 runs sampled)
Decode test
#1: entities.decodeXML x 1,608,430 ops/sec ±0.14% (100 runs sampled)
#2: html-entities.decode - xml, body x 700,270 ops/sec ±0.17% (98 runs sampled)
#3: html-entities.decode - xml, strict x 696,913 ops/sec ±0.14% (100 runs sampled)
#4: html-entities.decode - xml, attribute x 689,648 ops/sec ±0.14% (101 runs sampled)
Escaping
Escape test
#1: entities.escapeUTF8 x 2,962,869 ops/sec ±0.14% (98 runs sampled)
#2: he.escape x 1,943,677 ops/sec ±0.19% (97 runs sampled)
#3: html-entities.encode - xml, specialChars x 1,854,260 ops/sec ±0.40% (99 runs sampled)
#4: entities.escape x 938,272 ops/sec ±0.15% (97 runs sampled)
Hi,
Thanks for creating this awesome library. I was wondering if it was possible to add a map file for this repo.
There is a decoding bug that impacts the ⨱ (timesbar) entity
We have the following string
Kun⨱alas Heritage Site/Conservancy
It should be decoded as "Kun⨱alas Heritage Site/Conservancy" but instead it gets decoded as "Kun×bar;alas Heritage Site/Conservancy"
IMHO cow shell be encoded as 🐄
however it is not, is this supported ? If my assumption is wrong please correct me.
see test case
$ node html/html-encode
[ '🐄', '🐄', '🐄' ]
[ '🐄', '🐄', '🐄' ]
[ '��', '��', '��' ]
[ '��', '��', '��' ]
Assertion failed
var entities = new(require('html-entities').AllHtmlEntities)();
let cows=[
"🐄",
"\u{1f404}",
"\uD83D\uDC04"];
console.log(cows);
console.log(cows.map((c)=>entities.encode(c)));
console.log(cows.map((c)=>entities.encodeNonASCII(c)));
console.log(cows.map((c)=>entities.encodeNonUTF(c)));
console.assert(entities.encodeNonUTF(cows[1])==="🐄")
The decoder has multiple issues that must be addressed
I am doing a complete rewrite of the decoder. The decoder would use incremental parser instead of regEx, and it would have a quirk mode and a strict mode.
Couldn't you put the output of this in a JSON file? As far as I can see this information is static, so shouldn't have to be calculated each time.
If you think it's a good idea I can get a PR together.
Hi!
I just updated the package but now Im getting the error "Entities is not a constructor".
In the version 1.x, the readMe said to declare the funcion like this:
const entities = new Entities();
But now in the newest version that constructor i not available, how can I upgrade the package? thanks!
Almost every character are sucessfully decoded but ’ => ' is still not decoded
You can confirm on this page decode online
that ’ is equivalent to '
Same happens with dagger
https://www.npmjs.com/package/html-entities/v/latest is listed as 2.3.3
(published 4 days ago).
The latest version (release/tag) listed on this repository 2.3.2
is from last year.
Hi,
Thanks for the awesome module.
I am generating JavaScript and I use the Handlebars templating engine. Handlebars html-encodes everything and I needed to get rid of all special symbol encodings.
It seems that the current npm version (1.0.10) of the module doesn't decode the "e; symbol to ".
Regards,
Nikolay Tsenkov
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.