Comments (8)
updated testcase
Original comment by [email protected]
on 1 Feb 2013 at 12:15
Attachments:
from java-html-sanitizer.
Entering into http://html5.validator.nu/ the first example
<p>123<p>abcdefg</p>456</p>
gives
Error: No p element in scope but a p end tag seen.
From line 1, column 24; to line 1, column 27
efg</p>456</p>↩
because the </p> at the end doesn't close a tag. The second <p> closes the
first <p> per HTML5 parsing rules.
http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.ht
ml#parsing-main-inbody says
"""
A start tag whose tag name is one of: "address", "article", "aside",
"blockquote", "center", "details", "dialog", "dir", "div", "dl", "fieldset",
"figcaption", "figure", "footer", "header", "hgroup", "main", "menu", "nav",
"ol", "p", "section", "summary", "ul"
If the stack of open elements has a p element in button scope, then act as if
an end tag with the tag name "p" had been seen.
Insert an HTML element for the token.
"""
which means that when a <p> is seen inside a <p>, an implicit </p> is seen, so
<p>123<p>abcdefg</p>456</p>
is equivalent to
<p>123</p><p>abcdefg</p>456
which is what the HTML sanitizer produces.
By understanding browser tag nesting rules, the sanitizer avoids a lot of
ambiguity in HTML, and can produce output that will be consistently and safely
interpreted by a variety of browsers.
----
Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>")
should not produce
"<div><meta/><p>abcdefg</p></div>"
since <meta> is not a block tag, and is not even allowed in the body.
----
Marking this bug invalid. Please reopen if you feel this was in error.
Original comment by [email protected]
on 2 Feb 2013 at 6:19
- Changed state: Invalid
from java-html-sanitizer.
The paragraph handling has just been added to illustrate the sanitizers
behaviour.
However, the meta-tag is a real problem for us as thunderbird generates markup
like "<blockquote><meta></blockquote>" all the time and we have to display this
for our users correctly. However, this becomes hard because the sanitizer
modifies the markup during the removal of the meta-tag. I just want the
sanitizer to remove the meta-tag which is currently not possible.
Please reopen as I'm not allowed to...
Kind regards
Matthias
Original comment by [email protected]
on 4 Feb 2013 at 12:30
from java-html-sanitizer.
Reopened.
Is the problem that you're doing something like
PolicyFactory policy = new HtmlPolicyBuilder()
.allowCommonBlockElements()
.allowElements("meta")
.toFactory();
String htmlSnippet = "<blockquote><meta></blockquote>";
String sanitized = policy.sanitize(htmlSnippet);
System.out.println(sanitized);
and you get
<blockquote></blockquote></body><meta />
?
Original comment by [email protected]
on 5 Feb 2013 at 9:42
- Changed state: New
from java-html-sanitizer.
Closing for lack of response. Re the attached test case:
> assertEquals("<p>123<p>abcdefg</p>456</p>",
>
Sanitizers.BLOCKS.sanitize("<p>123<p>abcdefg</p>456</p>"));
the test golden is invalid. <p> tags do not nest in HTML.
> assertEquals("<div><meta/><p>abcdefg</p></div>",
>
Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>"));
is also invalid since <p> tags cannot be direct children of <div> elements.
You can white-list <meta> elements if you like using a custom policy, but
<meta> is not a block element so should be Sanitizers.BLOCKS.
Original comment by [email protected]
on 24 Jul 2013 at 4:00
- Changed state: WontFix
from java-html-sanitizer.
Hello everyone!
We have a similar behaviour in this case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1><center>TEXT</H1>"));
For this one the result is:
<h1></h1>TEXT
instead of:
<h1>TEXT</h1>
But test case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1></center>TEXT</H1>"));
works as expected:
<h1>TEXT</h1>
What's wrong with the first one?
I would appreciate your feedback to this case.
Original comment by [email protected]
on 29 Sep 2014 at 7:08
from java-html-sanitizer.
#6, filed as
https://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=33
Original comment by [email protected]
on 1 Oct 2014 at 12:46
from java-html-sanitizer.
Version: r239.
I am facing similar issue with <font> along with <div>.
<div> along with its content is moved outside <font>.
Due to this <font> is not applied to the content in <div>.
Sample code snippet:
PolicyFactory policy = new HtmlPolicyBuilder()
.allowCommonBlockElements()
.allowElements("font")
.allowAttributes("face", "size").onElements("font")
.toFactory();
String htmlSnippet = "<font face=\"Calibri\" size=\"2\"><div>Hi Hari</div></font>";
String sanitized = policy.sanitize(htmlSnippet);
Original:
<font face="Calibri" size="2"><div>Hi Hari</div></font>
Sanitized:
<font face="Calibri" size="2"></font><div>Hi Hari</div>
Is this issues can also be covered in above issue?
from java-html-sanitizer.
Related Issues (20)
- Sanitizer converting font names in 'style' attribute value to lower case
- CSS property `overflow-wrap` not included in CssSchema definition list
- xxx-large font-size is discarded when allowStyling() is used HOT 6
- Issue while disallowing attributes matching pattern
- Remove malicious code from svg content HOT 1
- Encoding malicious code instead of removing it HOT 4
- Index out of bound when empty list is passed to `allowAttributes(...).globally()`
- Guava removal breaks compatibility (with JDK9) HOT 13
- Html sanitizer repeatedly adds rel="noopener noreferrer" even if it's pre-exist HOT 1
- SECURITY.MD currently does not contain sensible information
- Sanitizing CSS HOT 3
- ClassNotFoundException: org.owasp.shim.Java8Shim after update to 20240325.1 HOT 5
- Release 20240325 cannot be transpiled HOT 1
- Issue in 2024x version with styles
- Question: What means Recognize foreign content syntactic context: mathml / svg?
- Issues encountered while processing <a> tags
- rel attributes are reordered in 20220608.1
- Possible to enforce having mutliple attributes on tag?
- On Java8Shim class, better to catch Throwable instead of Error
- text-align literals are outdated
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from java-html-sanitizer.