owasp / java-html-sanitizer Goto Github PK
View Code? Open in Web Editor NEWTakes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
License: Other
Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.
License: Other
What steps will reproduce the problem?
1. new HtmlPolicyBuilder
2. .allowElements("hr")
3. HtmlSanitizer.sanitize("<hr />", policy);
What is the expected output? What do you see instead?
expected - <hr />
instead - <hr>
What version of the product are you using? On what operating system?
r99
Please provide any additional information below.
For browsers the output <hr> is correct. However, it is not usable if we need
some additional XML processing of the output.
Original issue reported on code.google.com by [email protected]
on 18 Sep 2012 at 6:59
http://www.w3.org/html/wg/drafts/srcset/w3c-srcset/ describes an extension
attribute to HTML <img> elements that allows multiple annotated URLs.
Make sure the URl protocol policy applies to all of them.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2014 at 3:59
See standalone JUnit test attached.
Briefly:
"<select>\n" +
"<option>A</option>\n" +
"<option>B</option>\n" +
"</select>\n"
will sanitize just fine into:
<select>" +
"<option>A</option>" +
"<option>B</option>" +
"</select>\n"
but
"<select>" +
"<option>A</option> \n" + // <-- notice the space before the newline
"<option>B</option> \n" + // <-- notice the space before the newline
"</select>\n";
produces this mangled result:
"<select><option>A</option></select> \n" +
"<option>B</option> \n" +
"\n"
Original issue reported on code.google.com by [email protected]
on 3 May 2014 at 1:08
Attachments:
What steps will reproduce the problem?
1. Initialize a sanitizer as Sanitizers.BLOCKS.and(Sanitizers.FORMATTING).
2. Attempt to sanitize the string "<em>Emphasized</em>". This trips off the
<em> tags.
What is the expected output? What do you see instead?
I would expect to see <em>Emphasized</em>. I see Emphasized instead.
What version of the product are you using? On what operating system?
r239. Any OS
Please provide any additional information below.
Some HTML programmers consider em and strong to be legacy and obsolete.
However, the HTML standard still supports them. Additionally, The OWASP
Sanitizer supports the strong tag but not the em tag. If strong is supported,
so should be em.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2014 at 5:06
I am using r239 in windows 8 when i give the text as below
<span style=\"color:rgb(72, 72, 72); font-family:helveticaneue\"> <span>my </span> list of style names or a </span>
the sanitization text is not properly ending the span tag the text as below which i got after sanitizaion
<span style="color:rgb(72, 72, 72); font-family:helveticaneue"> my </span> list of style names or a
What steps will reproduce the problem?
Sanitize with ExampleTest.java the string below causes StackOverflow (with
r173-EbayPolicy-based code), likely due to very deep regular expression tree.
Ran into this sanitizing a large email collection. Verified with clean r176 by
adding a test to ExamplesTest.java. If I sufficiently shorten the image data, I
no longer get stack overflow.
1) testDataImage(org.owasp.html.ExamplesTest)java.lang.StackOverflowError
at java.util.regex.Pattern$6.isSatisfiedBy(Pattern.java:4763)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
[...repeats nothing more useful at bottom of list...]
The test is (String all on one line):
public final void testDataImage() {
String input="<a class=\"atc_s addthis_button_compact\" style=\"background-image:url(data:image/gif;base64,R0lGODlhMgCaAPf/AO5ZOuRpTfXSyvro5Pz08uCEb+ShkeummPOMdPOqmdZdQvbEud1UNuqmlvqbhchNMfnTyvXPxu6pmtFkTeJ6Y9JxXPR8YNRSNvyunP3Mwd1zW+iCau9dPtBOMe69sutwVPGGbuFcPvmRefqAZuhiQ/rJv9pFJf3h2up0WttCItmBbv3r5/26q/bc1v3XzvHOx8tML/nf2cpAI+hfQf3GufVbOvyrmvCkk/7r5vnm4t+BbPism/apmN9DI+hNLrkrD/x4V98+HeFBIPt0U/x2VflaOfdwT/lzUvhYN/p2VfhWNd47Gvx5WPtyUflcO+JEI/VsS/FjQvRpSPtwT/tuTebm5vpmRfppSPlePeRIJ8XFxdRQNPpsS8I9IYyMjOhQL+xYN+ZMK/liQflgP+1dO+pUM//188RBJZGRkff39/718vFZOcA/JtVXO8VEKPvVzeuhkPynk/ermsdHK/7Yz/ehjvi2p9Z4ZPre2NRDJPVkRPzz8d9NLdR1YNM2F/SkksBBKOdXOfaHbdhsVPFbO81cQs1UOO2HcPNTM/708dd7Zt1GJuFRMd1JKfBiQvyNcuF1XfqOd+6gj+SDbPuJb/re1/TUzeReP+mHce1zWeOdjthYPOaSgO2LdOyllPFzWPezpPe7rfVzVO6Hb/vz8f7f2eGJdOKMee54XeBiR/Z5XeNlSfyVfPJtT++ikf7i3POrnPGun9+VhOVSMvzo499/as1EJup5YPvUy+abi+tjRPNrS91aPfCCafyxn/SCaNVZPeu7r+B+ad1WOOuypOm3rPt6W/OikPvq5vCmlt6Qf/ygivp2V/OTfPrb1eKfkP7Z0OyBaeibiuieju+ciu2ejeKCbOatn+2YhOFVN+eRfedWNvWplvLRyuF4Xs5kTdlOL+3Eu+1mR/zp5fzq5dVMLtJTOPvDtvKgjvy1pe9yVchFKeafj+WYh/nBs/HIvv3Ctd9YONdVOM5BI8pXP/jUzOqKdOiMd9toUN1iSONuU8HBwYmJifX19f///////yH/C1hNUCBEYXRhWE1QPD94cGFja2V0IGJlZ2luPSLvu78iIGlkPSJXNU0wTXBDZWhpSHpyZVN6TlRjemtjOWQiPz4gPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iQWRvYmUgWE1QIENvcmUgNS4wLWMwNjEgNjQuMTQwOTQ5LCAyMDEwLzEyLzA3LTEwOjU3OjAxICAgICAgICAiPiA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPiA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91dD0iIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtbG5zOnhtcD0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wLyIgeG1wTU06T3JpZ2luYWxEb2N1bWVudElEPSJ4bXAuZGlkOkEyRDgxODE2MkMyMDY4MTE4NzFGRDNDMzU5QkE3OTE3IiB4bXBNTTpEb2N1bWVudElEPSJ4bXAuZGlkOkMzMjA1M0I4QkM4RjExRTBCRDBEQkE0MTlGMTc4MDZGIiB4bXBNTTpJbnN0YW5jZUlEPSJ4bXAuaWlkOjlFQzFFMTZFQkM4RDExRTBCRDBEQkE0MTlGMTc4MDZGIiB4bXA6Q3JlYXRvclRvb2w9IkFkb2JlIFBob3Rvc2hvcCBDUzUuMSBNYWNpbnRvc2giPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDpDMjFGMUIwQjMyMjA2ODExODcxRkQzQzM1OUJBNzkxNyIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDpBMkQ4MTgxNjJDMjA2ODExODcxRkQzQzM1OUJBNzkxNyIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PgH//v38+/r5+Pf29fTz8vHw7+7t7Ovq6ejn5uXk4+Lh4N/e3dzb2tnY19bV1NPS0dDPzs3My8rJyMfGxcTDwsHAv769vLu6ubi3trW0s7KxsK+urayrqqmop6alpKOioaCfnp2cm5qZmJeWlZSTkpGQj46NjIuKiYiHhoWEg4KBgH9+fXx7enl4d3Z1dHNycXBvbm1sa2ppaGdmZWRjYmFgX15dXFtaWVhXVlVUU1JRUE9OTUxLSklIR0ZFRENCQUA/Pj08Ozo5ODc2NTQzMjEwLy4tLCsqKSgnJiUkIyIhIB8eHRwbGhkYFxYVFBMSERAPDg0MCwoJCAcGBQQDAgEAACH5BAEAAP8ALAAAAAAyAJoAAAj/AP+h8EGwoMGDCBMqXOgDxb8ZUphInEixosWLGDMykTLDB5CPIEOKHEmypEmQBImoXMmypcuXMGOuJDikps2bOHPq3MnTJsEmQJv4G0rUX9CjSJMqXXqU4JSnRaM+nWKMxtBXy6Y8gvZoqtevYL8SpEK2KJWiCsjSMIPNlCUC6lj5u7eLrN27ePOSJcilb1EuaPvSQaZBnAUJGkT487DCTBwulOj4M8OCSxw6vvwxYzH5cV8uBK+IHiq69FBgoh0MPcHCAgnF7+zFIKArw4kAufxFw+AvBixiODo18IeqNEEryItaKdoGuRUEHgj4I6Aqkr8CIeT422QF0wEX13f4/xvEi1y9A9ob6EF+PDnR5USbW8EQydElaf4aWC/giHcbeGbYkcB12rWhhxkCSJBAAgXMwJ4PYkQYVVEXRIgDHp+IMYI/7Rzijw4c2ODPBf4kIw81H/IwohgZCJDKBxAcwkGEBI1h4xgTjniBjSMMQBQuF4DwYYgjpuMPBKX4c4qKO44wzlAlDMOBjQRhYaWVW2Sp5RZXYkEIJLUosAUDWACwBQBYBMLlLfgwwMCZalr5pQZbZHMlQU7kqeeefOq5BgA19LnnGoEK6sQaa+xJUBGMNuroo5BGKumkjRKExKWYZqrpppx26immPsyAiBKklmrqqaimquqqSiAyw0MMxf8q66yv/lPID7jmquuuvPbq668/FPIPGyGcZOyxyIbAxg9JNOvss9BGK+201DqL6xHYZqvtttx26+232eJqxLjklmvuueimqy65uELhLhQ5vivvvPTWa6+8uEqhb45D6SuFKOes5oAUgrggiL8IJ6xwwrhG4bBZRRnisDtqaKNCNwQEgMB1Mzjs8ccgh+xwww8TBRhREkfxxgCDXIJCA4NsHAw5atQRxS9v+KOGHVHU8YZ4rdihs80e40rG0aSVdsVQDxzdzFB4gPLBMKP4E441LewRTwmVKCCLP95w408LnlwzziQG+KPP0WTgCsbbyhUFw9tgbFAMKf7s8YGQimz/cYM/cwtjAAT+KPL3BB0MIIABfz+zzdu4liF53ETNIfkfvTBSjjL+aBKNP3cw8oc/c4SSSCxwgO4K6bMk8gI7cMBxBziS4/rF7fz648bttLSwyheZ+KMMBf70wccxuvszzTqcFC+J7l8s8II5+URAAR+34xrG9mHkeMYZ23+QA1ERnEF8H42g488ZsPgTgTP+qFDN+mGIP9QCtjSyPa5Z9N9/FwAMYBf8l4VFTKACD+iCDLJggi6YIAt5GCA+6CEDGTgwgv0z4De6MA//4eoJIAyhCEcYwhSkoAckFOEJUwhCE4oQV0KIoQxnSMMa2vCGOJQhroLAwx768IdADKIQ/4fYwx8Awg9LSKISl8jEJjrxiVBcgh8AMSxgWfGKWGTDP/6hhX148YtgDKMYx0jGMu5DC1zsR+7WyMY2FqUfXXSjHOc4IS/S8Y5ztCMe97hGPfLxj1HxIyAHKchB/rGQhtwjIhN5x0UyMo/7eOQhIylJRVKyko28JCYhuclMdpKOjvxkjkIpykBqspS5IyUqh6LKVbYSla8sZSxFOctP1rKTt9xkLjG5y0r2UpJnTMMqc5cGNHbRjGVU4xuRSUY0bvGZ0IymNLe4D2X6ox/7mKY2t8lNblbzmtnspjjHuc1IhpOc6EznP86pzmiigR/wjKc850nPetrznvxAwz+8UP+FfvjznwANqEAHStCC9qMKXuCHQRfK0Ib6E54OjahE/wnRiVqUoRW9qEYHmtGNevShCv2oSDsqUo2StKQWPSlKJarSlTq0pS7FaEhjOlGY0rSgNr0pR2eq04bmtKcA/SlQQTpUn/K0qAQV6lCVClSm9tSpOoXqTaVKU6rG1KouxepKtYpSfvATqQVF6D7xic9+ArQKZLWnF9oZTX6Y9aD8YKtc29pPtM71rs90a1zxyte98vWZx2SmYL/oTHRqwZqPhGM6T/lIdoqTsYx0bDchm0jJehOXi8UsOb+py80i1pfj5KxmvflZXoa2tMD0rC3RKVrQsna0oYXtY2U7WdplclMLwqxkMdUZ2MEKtrDofGdahytPfabzqzEVKzqPutV0Mrerzo1qdKc63aou961XJadbnzrO7XK3m979Lnixa93ukjer15XuctWrXfZ2173ifG5J/TpO5LpUuehMKHH3u1ZxBgQAOw==);\"></a>";
String sanitized = EbayPolicyExample.POLICY_DEFINITION.sanitize(input);
System.out.println(sanitized);
}
What is the expected output? What do you see instead?
Sanitized HTML with this style monstrosity removed or passed.
What version of the product are you using? On what operating system?
r173-based code, repeated with test case in clean r176.
$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode)
Linux 3.2.0-49-generic-pae #75-Ubuntu SMP
Attached gz version of smallest HTML fragment that causes error (image data can
be shortened further, but attached is with original image data).
Thanks!
Fred
PS:
FWIW -
in StylingPolicy.html, the '.' in the below regexp should probably be escaped.
However, doing so does not help with above problem (also doesn't cause new
failures):
private static final Pattern NON_NEGATIVE_LENGTH = Pattern.compile(
"(?:0|[1-9][0-9]*)([.][0-9]+)?(ex|[ecm]m|v[hw]|p[xct]|in|%)?");
I also though in CssGrammar.java, the URL_CHARS regexp should require at least
one character (didn't check with actual CssGrammar, though) by makeing '*' into
'+', but that also did not abolish the problem (also doesn't cause new
failures):
// url chars ({url_special_chars}|{nonascii}|{escape})+
String URL_CHARS = "(?:"
+ url_special_chars + "|" + nonascii + "|" + escape + ")+";
Original issue reported on code.google.com by [email protected]
on 16 Jul 2013 at 2:51
Attachments:
I can not say this is bug but may be the policy we configure is wrong.
On the string if have html entites " " than after sanitize it show (empty
space) but not return " " while for other example "<", ">" shows
correctly after sanitize.
example,
final String test = " >";
final PolicyFactory policy = Sanitizers.FORMATTING.and(
Sanitizers.BLOCKS).and(Sanitizers.STYLES);
final String safeHTML = policy.sanitize(test);
System.out.println("Before:" +test);
System.out.println("After:" +safeHTML);
Result:
-------
Before: >
After: >
Actually we need after sanitize so can your provide guidance on this how
to achieve.
Thx in advance!
Kr,
Urvish
Original issue reported on code.google.com by [email protected]
on 23 May 2014 at 1:17
There are known Style Attribute XSS attacks like:
<DIV STYLE="color: red; width: expression(alert('XSS')); background-image:
url('expression.png') ">
Or
<DIV STYLE="background-image: url(javascript:alert('XSS')); border-image:
url(images/javascript.png) 30 round round;">
And i need to satinaze html to this:
<DIV STYLE="color: red; background-image: url('expression.png') ">
Or
<DIV STYLE="border-image: url(images/javascript.png) 30 round round;">
Is this librarry cover such options?
Original issue reported on code.google.com by [email protected]
on 19 Jun 2013 at 1:02
What steps will reproduce the problem?
String css = "font-family:'Arial','sans-serif'";
StylingPolicy stylingPolicy = new StylingPolicy(CssSchema.DEFAULT);
stylingPolicy.sanitizeCssProperties(css);
What is the expected output? What do you see instead?
Expected: font-family:'arial' , 'sans-serif'
Actual: font-family:'arial' ,
What version of the product are you using? On what operating system?
svn trunk (r227) on Centos 6.5
Please provide any additional information below.
sanitizeCssProperties() works properly when 'sans-serif' is unquoted. It looks
like quotedString in StylingPolicy.java doesn't allow for '-' in the font name.
Attached is a patch with a potential fix.
Original issue reported on code.google.com by [email protected]
on 31 Mar 2014 at 3:43
Attachments:
What steps will reproduce the problem?
1. Consider this HTML:
<table style="color: rgb(0, 0, 0); font-family: Arial, Geneva, sans-serif;">
<tbody>
<tr>
<th>Column One</th><th>Column Two</th>
</tr>
<tr>
<td align="center" style="background-color: rgb(255, 255, 254);"><font
size="2">Size 2</font></td>
<td align="center" style="background-color: rgb(255, 255, 254);"><font
size="7">Size 7</font></td>
</tr>
</tbody>
</table>
If you display this in a browser, all the text inside the table renders in a
sans-serif font.
2. Sanitize that HTML with allowStyling(). Some of the style attributes are
moved to a font tag. This is the output:
<table>
<font face="Arial, Geneva, sans-serif" style="color:#000">
<tbody>
<tr>
<th>Column One</th>
<th>Column Two</th>
</tr>
<tr>
<td align="center"><font style="background-color:#fffffe"><font size="2">Size
2</font></font></td>
<td align="center"><font style="background-color:#fffffe"><font size="7">Size
7</font></font></td>
</tr>
</tbody>
</font>
</table>
If you view this in a browser, the table text is now rendered in serif instead
of sans-serif.
What is the expected output? What do you see instead?
I think this is the expected output, given the design of the library. However,
I question whether transforming style attributes by adding a font tag is really
the "right" thing to do. Besides changing how the HTML is rendered, the font
tag is deprecated in HTML 4.0 and is not supported in HTML 5. If the code is
able to generate sanitized style attributes for use in the font tag, why not
put those same style attributes in the original style attribute (in this case,
on the table element)?
What version of the product are you using? On what operating system?
r135 on Windows 7, with Java 6.
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 1 Feb 2013 at 4:47
The following code:
StringBuilder retVal = new StringBuilder();
PolicyFactory policyFactory = new HtmlPolicyBuilder().allowElements("b", "i", "br", "p").allowWithoutAttributes("span").toFactory();
HtmlStreamRenderer renderer = HtmlStreamRenderer.create(retVal,
new Handler<String>() {
public void handle(String x) {
throw new AssertionError(x);
}
});
HtmlSanitizer.sanitize("<span>foo</span>", policyFactory.apply(renderer));
Returns "foo"
, not "<span>foo</span>"
What steps will reproduce the problem?
<p><span class="application-font-size-14"><span style="color: rgb(40, 40, 40);"
class="application-font-name-arial">Lorem ipsum dolor sit amet, adipiscing
elit. In scelerisque condimentum. </span>Phasellus molestie hendrerit
augue.</span></p><p><span class="application-bold application-font-size-14">In
eget arcu at fermentum tortor:</span></p><ul
class="application-ul-disc"><li><span style="color: rgb(40, 40, 40);"
class="application-font-name-arial application-font-size-14">Sapien sed
fermentum </span></li><li><span style="color: rgb(40, 40, 40);"
class="application-font-name-arial application-font-size-14">Tellus consectetur
sit amet</span></li><li><span style="color: rgb(40, 40, 40);"
class="application-font-name-arial application-font-size-14">Sed interdum
ligula nec </span></li></ul><ul class="application-ul-disc"><br></ul><p><span
style="color: rgb(40, 40, 40);" class="application-font-name-arial
application-font-size-14">Vestibulum ultricies, arcu neque euismod ipsum, id
tempor sem ante quis sem.</span></p><p><span>Donec mi ipsum, pretium sit amet
interdum quis, egestas et justo.</span></p>
This becomes:
<p><span class="application-font-size-14"><span style="color:rgb( 40 , 40 , 40
)" class="application-font-name-arial">Lorem ipsum dolor sit amet, adipiscing
elit. In scelerisque condimentum. </span>Phasellus molestie hendrerit
augue.</span></p><p><span class="application-bold application-font-size-14">In
eget arcu at fermentum tortor:</span></p><ul
class="application-ul-disc"><li><span style="color:rgb( 40 , 40 , 40 )"
class="application-font-name-arial application-font-size-14">Sapien sed
fermentum </span></li><li><span style="color:rgb( 40 , 40 , 40 )"
class="application-font-name-arial application-font-size-14">Tellus consectetur
sit amet</span></li><li><span style="color:rgb( 40 , 40 , 40 )"
class="application-font-name-arial application-font-size-14">Sed interdum
ligula nec </span></li></ul><ul class="application-ul-disc"><li><br
/></li></ul><p><span style="color:rgb( 40 , 40 , 40 )"
class="application-font-name-arial application-font-size-14">Vestibulum
ultricies, arcu neque euismod ipsum, id tempor sem ante quis
sem.</span></p><p><span>Donec mi ipsum, pretium sit amet interdum quis, egestas
et justo.</span></p>
Which has a closing li that become </li>
in the sanitized version.
What is the expected output? What do you see instead?
The expended output would be </li>
What version of the product are you using? On what operating system?
'com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer:r239'
I'm on Ubuntu 14.04, but this is also being run on Windows Server 2008
Please provide any additional information below.
So the way I'm using the sanitizer, is to run the sanitzer do some
manipulations to the before and after running the content through a policy. as
necessary and compare to see if anything has been removed. The reason I'm
doing this is that my requirement is that I block xss, not and not store the
sanitized content. If there was a way to have to check that something was
removed, or not to clean up the html, that would be helpful, in addition to not
messing with the </li>
tag.
Here is the policy that I'm using currently:
package com.affinnova.platform.util;
// Copyright (c) 2011, Mike Samuel
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
//
// Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// Neither the name of the OWASP nor the names of its contributors may
// be used to endorse or promote products derived from this software
// without specific prior written permission.
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
// POSSIBILITY OF SUCH DAMAGE.
import com.google.common.base.Predicate;
import org.owasp.html.HtmlPolicyBuilder;
import org.owasp.html.PolicyFactory;
import java.util.regex.Pattern;
/**
* Based on the
* <a href="http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project#Stage_2_-_Choosing_a_base_policy_file">AntiSamy EBay example</a>.
* <blockquote>
* eBay (http://www.ebay.com/) is the most popular online auction site in the
* universe, as far as I can tell. It is a public site so anyone is allowed to
* post listings with rich HTML content. It's not surprising that given the
* attractiveness of eBay as a target that it has been subject to a few complex
* XSS attacks. Listings are allowed to contain much more rich content than,
* say, Slashdot- so it's attack surface is considerably larger. The following
* tags appear to be accepted by eBay (they don't publish rules):
* {@code <a>},...
* </blockquote>
*/
public class JavaHtmlSanitizerPolicy {
// Some common regular expression definitions.
// The 16 colors defined by the HTML Spec (also used by the CSS Spec)
private static final Pattern COLOR_NAME = Pattern.compile(
"(?:aqua|black|blue|fuchsia|gray|grey|green|lime|maroon|navy|olive|purple"
+ "|red|silver|teal|white|yellow)");
// HTML/CSS Spec allows 3 or 6 digit hex to specify color
private static final Pattern COLOR_CODE = Pattern.compile(
"(?:#(?:[0-9a-fA-F]{3}(?:[0-9a-fA-F]{3})?))");
private static final Pattern NUMBER_OR_PERCENT = Pattern.compile(
"[0-9]+%?");
private static final Pattern PARAGRAPH = Pattern.compile(
"(?:[\\p{L}\\p{N},'\\.\\s\\-_\\(\\)]|&[0-9]{2};)*");
private static final Pattern HTML_ID = Pattern.compile(
"[a-zA-Z0-9\\:\\-_\\.]+");
// force non-empty with a '+' at the end instead of '*'
private static final Pattern HTML_TITLE = Pattern.compile(
"[\\p{L}\\p{N}\\s\\-_',:\\[\\]!\\./\\\\\\(\\)&]*");
private static final Pattern HTML_CLASS = Pattern.compile(
"[a-zA-Z0-9\\s,\\-_]+");
private static final Pattern ONSITE_URL = Pattern.compile(
"(?:[\\p{L}\\p{N}\\\\\\.\\#@\\$%\\+&;\\-_~,\\?=/!]+|\\#(\\w)+)");
private static final Pattern OFFSITE_URL = Pattern.compile(
"\\s*(?:(?:ht|f)tps?://|mailto:)[\\p{L}\\p{N}]"
+ "[\\p{L}\\p{N}\\p{Zs}\\.\\#@\\$%\\+&;:\\-_~,\\?=/!\\(\\)]*+\\s*");
private static final Pattern NUMBER = Pattern.compile(
"[+-]?(?:(?:[0-9]+(?:\\.[0-9]*)?)|\\.[0-9]+)");
private static final Pattern NAME = Pattern.compile("[a-zA-Z0-9\\-_\\$]+");
private static final Pattern ALIGN = Pattern.compile(
"(?i)center|left|right|justify|char");
private static final Pattern VALIGN = Pattern.compile(
"(?i)baseline|bottom|middle|top");
private static final Predicate<String> COLOR_NAME_OR_COLOR_CODE
= new Predicate<String>() {
public boolean apply(String s) {
return COLOR_NAME.matcher(s).matches()
|| COLOR_CODE.matcher(s).matches();
}
};
private static final Predicate<String> ONSITE_OR_OFFSITE_URL
= new Predicate<String>() {
public boolean apply(String s) {
return ONSITE_URL.matcher(s).matches()
|| OFFSITE_URL.matcher(s).matches();
}
};
private static final Pattern HISTORY_BACK = Pattern.compile(
"(?:javascript:)?\\Qhistory.go(-1)\\E");
private static final Pattern ONE_CHAR = Pattern.compile(
".?", Pattern.DOTALL);
public static final PolicyFactory POLICY_DEFINITION = new HtmlPolicyBuilder()
.allowAttributes("id").matching(HTML_ID).globally()
.allowAttributes("class").matching(HTML_CLASS).globally()
.allowAttributes("lang").matching(Pattern.compile("[a-zA-Z]{2,20}"))
.globally()
.allowAttributes("title").matching(HTML_TITLE).globally()
.allowStyling()
.allowAttributes("align").matching(ALIGN).onElements("p")
.allowAttributes("for").matching(HTML_ID).onElements("label")
.allowAttributes("color").matching(COLOR_NAME_OR_COLOR_CODE)
.onElements("font")
.allowAttributes("face")
.matching(Pattern.compile("[\\w;, \\-]+"))
.onElements("font")
.allowAttributes("size").matching(NUMBER).onElements("font")
.allowAttributes("href").matching(ONSITE_OR_OFFSITE_URL)
.onElements("a")
.allowStandardUrlProtocols()
.allowAttributes("nohref").onElements("a")
.allowAttributes("name").matching(NAME).onElements("a")
.allowAttributes(
"onfocus", "onblur", "onclick", "onmousedown", "onmouseup")
.matching(HISTORY_BACK).onElements("a")
.requireRelNofollowOnLinks()
.allowAttributes("src").matching(ONSITE_OR_OFFSITE_URL)
.onElements("img")
.allowAttributes("name").matching(NAME)
.onElements("img")
.allowAttributes("alt").matching(PARAGRAPH)
.onElements("img")
.allowAttributes("border", "hspace", "vspace").matching(NUMBER)
.onElements("img")
.allowAttributes("border", "cellpadding", "cellspacing")
.matching(NUMBER).onElements("table")
.allowAttributes("bgcolor").matching(COLOR_NAME_OR_COLOR_CODE)
.onElements("table")
.allowAttributes("background").matching(ONSITE_URL)
.onElements("table")
.allowAttributes("align").matching(ALIGN)
.onElements("table")
.allowAttributes("noresize").matching(Pattern.compile("(?i)noresize"))
.onElements("table")
.allowAttributes("background").matching(ONSITE_URL)
.onElements("td", "th", "tr")
.allowAttributes("bgcolor").matching(COLOR_NAME_OR_COLOR_CODE)
.onElements("td", "th")
.allowAttributes("abbr").matching(PARAGRAPH)
.onElements("td", "th")
.allowAttributes("axis", "headers").matching(NAME)
.onElements("td", "th")
.allowAttributes("scope")
.matching(Pattern.compile("(?i)(?:row|col)(?:group)?"))
.onElements("td", "th")
.allowAttributes("nowrap")
.onElements("td", "th")
.allowAttributes("height", "width").matching(NUMBER_OR_PERCENT)
.onElements("table", "td", "th", "tr", "img")
.allowAttributes("align").matching(ALIGN)
.onElements("thead", "tbody", "tfoot", "img",
"td", "th", "tr", "colgroup", "col")
.allowAttributes("valign").matching(VALIGN)
.onElements("thead", "tbody", "tfoot",
"td", "th", "tr", "colgroup", "col")
.allowAttributes("charoff").matching(NUMBER_OR_PERCENT)
.onElements("td", "th", "tr", "colgroup", "col",
"thead", "tbody", "tfoot")
.allowAttributes("char").matching(ONE_CHAR)
.onElements("td", "th", "tr", "colgroup", "col",
"thead", "tbody", "tfoot")
.allowAttributes("colspan", "rowspan").matching(NUMBER)
.onElements("td", "th")
.allowAttributes("span", "width").matching(NUMBER_OR_PERCENT)
.onElements("colgroup", "col")
.allowElements(
"a", "label", "noscript", "h1", "h2", "h3", "h4", "h5", "h6", "p", "i", "b", "u", "strong", "em", "small", "big", "pre", "code",
"cite", "samp", "sub", "sup", "strike", "center", "blockquote", "hr", "br", "col", "font", "map", "span", "div", "img",
"ul", "ol", "li", "dd", "dt", "dl", "tbody", "thead", "tfoot", "table", "td", "th", "tr", "colgroup", "fieldset", "legend", "abbr",
"acronym", "address", "article", "aside", "basefont", "bdi", "bdo", "big", "caption", "colgroup", "del", "dfn", "dir", "font", "figcaption",
"figure", "footer", "header", "hgroup", "ins", "mark", "menu", "nav", "q", "s", "section", "style",
"tt", "var", "wbr")
.allowWithoutAttributes(
"a", "label", "noscript", "h1", "h2", "h3", "h4", "h5", "h6", "p", "i", "b", "u", "strong", "em", "small", "big", "pre", "code",
"cite", "samp", "sub", "sup", "strike", "center", "blockquote", "hr", "br", "col", "font", "map", "span", "div", "img",
"ul", "ol", "li", "dd", "dt", "dl", "tbody", "thead", "tfoot", "table", "td", "th", "tr", "colgroup", "fieldset", "legend", "abbr",
"acronym", "address", "article", "aside", "basefont", "bdi", "bdo", "big", "caption", "colgroup", "del", "dfn", "dir", "font", "figcaption",
"figure", "footer", "header", "hgroup", "ins", "mark", "menu", "nav", "q", "s", "section", "style",
"tt", "var", "wbr")
.toFactory();
}
Original issue reported on code.google.com by [email protected]
on 6 Feb 2015 at 9:41
Hi, great work but I've noticed something strange in mvn repostiory and hope you can help and possibly update your release version numbers.
If you look in mvn repository you can see a release this year:
http://mvnrepository.com/artifact/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20150501.1
But the page is recommending that a newer release is available from last year !?
http://mvnrepository.com/artifact/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/r239
And the OWASP page for the project suggests last year's version is the latest:
https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project
Please advise whether the 2015 one is a release (or just a beta/alpha unstable thing).
If it is a release, then please change your versioning of the pom to ensure that the latest version is recognised as the latest version by maven tooling. This might be due to ascii string comparison of the version number: ie
'r' > '2'
"r239" > "20150501.1"
I'm not sure what impact this would have for example if auto versioning picks latest or maven enforcer warns about latest: because it seems like latest they might suggest the 2014 one and not the 2015 one by maven version conventions: (this might need testing). Therefore users attempting to use latest may get trapped on an old version.
The project version in the pom is 1.1-SNAPSHOT, but a release of 1.1 exists on Maven Central. If the release of 1.1 is valid on Maven Central, then the version in the pom should be 1.2-SNAPSHOT.
What steps will reproduce the problem?
1. include owasp-java-html-sanitizer.jar in the classpath but not guava.jar
2. run the code:
PolicyFactory policy = Sanitizers.FORMATTING;
logger.debug("Policy is " + policy);
What is the expected output? What do you see instead?
Expect either a thrown exception or the debug line to be printed
However, the debug line is never printed, the code just seems to hang
What version of the product are you using? On what operating system?
r198, Ubuntu Linux LTS 12
Please provide any additional information below.
This isn't a big problem, the setup instructions do after all say that
guava.jar is necessary but for troubleshooting purposes, shouldn't there be
some descriptive way of reporting the missing dependency?
Original issue reported on code.google.com by [email protected]
on 25 Jul 2013 at 10:53
Not a real bug. Nothing to see here.
Original issue reported on code.google.com by [email protected]
on 18 Aug 2011 at 11:56
MGupta provided the below on the group list
"""
I'm trying to use the default policy and have observed following two issues.
1. <span> is not allowed
2. <br> is returned as <br > (note a space before the end tag)
For #1, I tried using a custom policy with allowElements("span") and it still
didn't work.
I then tried allowAttributes("id").globally().
This allowed me to use something like this ... <span id="abc">some text</span>
But I want to use <span> with NO attributes.
I even tried .allowWithoutAttributes("span"), but it did not work.
-----
public static final PolicyFactory POLICY_DEFINITION = new HtmlPolicyBuilder()
.allowAttributes("id", "class").globally()
.allowAttributes("href", "target").onElements("a")
.allowWithoutAttributes("span", "div")
.allowElements("a", "span", "div","input", "textarea")
.toFactory();
public static String sanitizeWithDefaultPolicy(String htmlString){
return Sanitizers.FORMATTING
.and(Sanitizers.BLOCKS)
.and(Sanitizers.IMAGES)
.and(Sanitizers.STYLES)
.and(POLICY_DEFINITION)
.sanitize(htmlString);
}
"""
Original issue reported on code.google.com by [email protected]
on 10 Feb 2014 at 11:36
Santizing:
<a href="ftp://site.com:user@host/file.txt">click here</a>
the '@' is replaced by '@'. However, the href and src attribute values are
URLs, not HTML text, so I believe the '@' should be left unencoded, or if
anything be URL-encoded.
Another context where I run into this is sanitizing email html content. It
sometimes points to attached images using a cid: (rfc2392) URL, eg:
<img src="[email protected]">
What version of the product are you using? On what operating system?
r173, linux, java 6.
Thanks!
Fred
Original issue reported on code.google.com by [email protected]
on 8 Jun 2013 at 8:26
If I specify a custom set of allowed URL protocols different from the set
"http", "https" and "mailto", some URLs are not handled correctly.
E.g. for the input "<img src=\"http://canaries.org/canary.png\">" the policy
builder
new HtmlPolicyBuilder()
.allowElements("img")
.allowAttributes("src").onElements("img")
.allowUrlProtocols("http")
returns an empty string, but should return the unmodified input value.
I have attached a patch containing an additional test case that shows the issue
and a fix for it in the class FilterUrlByProtocolAttributePolicy.
Original issue reported on code.google.com by [email protected]
on 26 Mar 2012 at 10:12
Attachments:
What steps will reproduce the problem?
1. Install from Maven
Source code:
PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.BLOCKS); // error
happens here
String safeHTML = policy.sanitize("<table>asdf</table>"); // never gets to this
line
What is the expected output? What do you see instead?
Sanitized output. Getting the following error instead:
Aug 22, 2014 9:28:49 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [Jersey Web Application] in context with
path [/asdf] threw exception [org.glassfish.jersey.server.ContainerException:
java.lang.NoClassDefFoundError: org/owasp/html/Sanitizers] with root cause
java.lang.NoClassDefFoundError: org/owasp/html/Sanitizers
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.filters.CorsFilter.handleNonCORS(CorsFilter.java:439)
at org.apache.catalina.filters.CorsFilter.doFilter(CorsFilter.java:178)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
What version of the product are you using? On what operating system?
OSX 10.9.4 using version r239. Tomcat v7.0.55. Java SE 1.7.0_67.
Original issue reported on code.google.com by [email protected]
on 22 Aug 2014 at 1:38
[email protected] says
We have a similar behavior in this case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1><center>TEXT</H1>"));
For this one the result is:
<h1></h1>TEXT
instead of:
<h1>TEXT</h1>
But test case:
assertEquals("<h1>TEXT</h1>",
Sanitizers.BLOCKS.sanitize("<H1></center>TEXT</H1>"));
works as expected:
<h1>TEXT</h1>
What's wrong with the first one?
Original issue reported on code.google.com by [email protected]
on 1 Oct 2014 at 12:46
i tried installing the java-html-sanitizer with the example from the maven.md file.
maven gave download problems for some old versions.
changing the version to
<version>[r239,)</version>
solved the problem
only minor issue but should help out outers.
There is no existing documentation on how to allow data: and tel: URLs. Add an
example that demonstrates how HtmlPolicyBuilder.allowUrlProcotols can be used
with data: and tel:.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2014 at 4:04
[deleted issue]
What steps will reproduce the problem?
1. HTML source contains the following link: <a
href="HTTP://some.site.org/">Link</a>
2. the input is filtered by applying the following simplified policy:
.allowAttributes("href").onElements("a").allowStandardUrlProtocols().allowElemen
ts("a").toFactory();
What is the expected output?
The link remains intact.
What do you see instead?
The link is removed, even though "http" is an allowed protocol as per policy.
What version of the product are you using?
r215
Additional information:
Internet Standard STD 66 [1] states:
[...] An implementation should accept uppercase letters as equivalent
to lowercase in scheme names (e.g., allow "HTTP" as well as "http" [...]
Changing the compare functionality in
src/org/owasp/html/FilterUrlByProtocolAttributePolicy.java will result in the
desired outcome:
@@ -77,7 +77,7 @@
}
break protocol_loop;
case ':':
- if (!protocols.contains(s.substring(0, i))) { return null; }
+ if (!protocols.contains(s.substring(0, i).toLowerCase())) { return
null; }
break protocol_loop;
}
}
[1]: http://tools.ietf.org/html/std66#section-3.1
Original issue reported on code.google.com by [email protected]
on 12 Feb 2014 at 11:06
What steps will reproduce the problem?
1. Pass an input string with ' or " - for example: <div>And he said,
"Hello."</div>
2. ' or " characters come back encoded
What is the expected output? What do you see instead?
I would expect that quotes within text nodes don't get encoded.
What version of the product are you using? On what operating system?
r209, Linux
Please provide any additional information below.
I already saw issue 15:
http://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=15
To answer the question that wasn't answered in that issue - "How is this
causing problems though?" - it causes a problem in rich text editors.
We expect that the user can enter text in a rich text editor; this includes
quotes. When that data gets stored and returned again in another/the same
page, they should see the ' or " they entered, not the encoded version of that
string.
Original issue reported on code.google.com by [email protected]
on 9 Sep 2013 at 3:34
Exploit: "Cause an exception, crash, or inf. loop in the sanitizer that causes
it to fail to provide service or consume inordinate resources for an input of
that size.
What steps will reproduce the problem?
1.Enter <a href="http://x '">t</a> with a large number of spaces
between the x and the '
2. 15kb of spaces -> 2s execution time. 60kb of spaces -> 171s execution time
What is the expected output? What do you see instead?
Expected: Worst case execution time is at most O(n) for an input of size n (or
execution time limited to mitigate an attack on server resources)
Observed: Worst case execution time is O(n^2) for an input of size n
What version of the product are you using? On what operating system?
As currently on http://canyouxssthis.com/HTMLSanitizer/reflect (no version
specified)
Please provide any additional information below.
Not sure if this is "in bounds" since it is a problem with the configuration of
the sanitizer rather than the sanitizer per se. However, for the sanitizer to
be effective for others who want to use it, they should receive thoroughly
tested configurations along with it.
The cause for this defect is the regex defined in
Pattern OFFSITE_URL =
Pattern.compile("(\\s)*((ht|f)tp(s?)://|mailto:)[\\p{L}\\p{N}][\\p{L}\\p{N}\\p{Z
s}\\.\#@\$%\\+&;:\\-_~,\\?=/!\\(\\)]*(\\s)*");
Two sequential character classes are starred and not disjoint. Specifically:
[\\p{Zs}]*(\\s)*
Original issue reported on code.google.com by [email protected]
on 27 Feb 2014 at 7:39
[deleted issue]
It looks like latest official release is r223 [1], but Google Code provides
r226 (and r223 is completely missing). It is kinda confusing. Also would it be
possible to tag official releases in svn?
Thanks
[1]:
http://repo1.maven.org/maven2/com/googlecode/owasp-java-html-sanitizer/owasp-jav
a-html-sanitizer/
Original issue reported on code.google.com by [email protected]
on 4 Mar 2014 at 8:10
I had hijacked another issue and was asked to create a new one :) After writing
several tests, it's simpler than I though
What steps will reproduce the problem?
1. Pass an input string with a ' or " in it
2. Comes back escaped as ' or "
What is the expected output? What do you see instead?
I expect my input to come back with the ' or " in it.
What version of the product are you using? On what operating system?
Using version r164 on Mac mountain lion
Please provide any additional information below.
The code is quite basic:
HtmlPolicyBuilder builder = new HtmlPolicyBuilder();
PolicyFactory factory = builder.toFactory();
String sanitized = factory.sanitize(input);
return sanitized;
Original issue reported on code.google.com by [email protected]
on 24 Jun 2013 at 4:55
What steps will reproduce the problem?
1. <a href="http://demo.testfire.net">CLICK HERE</a>
2. click on CLICK HERE
3.
What is the expected output? What do you see instead?
it should filter out html tags. In this context,it accepts <a> tag and href
attribute which is used to specify a link address. So, by giving the above
input and on clicking CLICK HERE, it goes to malicious link specified in href
attribute hence leading to html injection/XSS attacks
What version of the product are you using? On what operating system?
OS-Windows XP
Version-1.5.2
Please provide any additional information below.
vulnerable to html injection attacks
Original issue reported on code.google.com by [email protected]
on 11 Jan 2014 at 5:21
What steps will reproduce the problem?
HTML before sanitizing
<span style="font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";color:#505050">
HTML after sanitizing
<span style="font-size:9pt;font-family:'trebuchet ms' ,;color:#505050">
I already read in other issues, why the sans-serif font will be dropped. This
would be fine, but there is a "," left after removing the font.
Firefox struggles with this "," and will not use any of the provided Fonts.
So the expected Output is:
<span style="font-size:9pt;font-family:'trebuchet ms' ;color:#505050">
When removing the "," Firefox renders the page as expected.
What version of the product are you using? On what operating system?
r239, Windows 8.1, Firefox 32.0.3
No Issue in Internet Explorer and Chrome.
Original issue reported on code.google.com by [email protected]
on 26 Oct 2014 at 12:21
Hey there,
I wanted to use this sanitization library to help detect issues with HTML input. I noticed that when I have an closing tag in my input with no opening tag, the sanitizer will take care of it but will not emit an event in the HtmlChangeListener.
I think it might be related to this as well:
#40
Would greatly appreciate if this closing tag with no opening tag could be emitted as an event to be captured in my implementation of the HtmlChangeListener.
Thank you
test case:
@Test
public void testSpace() {
String text = "L&nbsp;&nbsp;&nbsp;L";
assertEquals(text, Sanitizers.FORMATTING.sanitize(text));
}
why:
1)
a is something different than a space. when i get a from my richttext editor i want to preserve it. in the above example when i would add the sanitized text into an html page it would look like L L
instead of L L
.
2)
i want to know if the user added something wrong and present an error:
if (!StringUtils.equals(text, clean)) {
addFieldError("wrong input! please check cleaned text");
}
i don't want this to happen after a spcae replacement.
todo:
remove the to space part or make it optional.
work around:
replace the by spaces. do the sanitizing and checking and re add the
When i sanitize links containing sub elements these elements will be moved outside the a element.
<a href=\"https://www.xyz.com\"><div>Button text</div></a>
will result in:
<a href="https://www.xyz.com" rel="nofollow"></a><div>Button text</div>
Can you please help me with this issue?
Example code:
String html = "<a href=\"https://www.xyz.com\"><div>Button text</div></a>";
PolicyFactory policy = new HtmlPolicyBuilder()
.allowElements("a", "div")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
.toFactory();
String safeHTML = policy.sanitize(html);
Thank you for a very useful tool!
I'm trying to deal with the result of content pasted from Excel spreadsheets
into an HTML text area. I'm interested in preserving as much style info as
possible. When I sanitize this, the font-family part of the style attribute
misses fonts, depending on the way the font-family values are quoted.
Specifically, with "font-family: Arial,serif", the 'serif' is missing iff it
was quoted in the style attribute. It looks like this does not happen to the
first font listed, and happens to the second font listed if it is "sans-serif"
but not of it is "Verdana" (std-font vs other handling?).
It does not seem to matter which quotes are used, ie style="font-family:
'a','b'" yields the same results as style='font-family: "a","b"'.
With an allowStyling() eBay policy (allows span element), Sanitize (these are
all font-family with 2 fonts and an irrelevant tag. What is different in each
set of 4 is which font(s) is/are quoted. Set 1, 3, 4 use single quote outer,
double inner. This is reversed in set 2 without effect. Set 1 and 2 use Arial,
sans-serif; set 3 Arial, Verdana, set 4 serif, sans-serif.
<span style='font-family:Arial,sans-serif;mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:"Arial",sans-serif;mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:Arial,"sans-serif";mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:"Arial","sans-serif";mso-fareast-language:EN-GB'>..</span>
<span style="font-family:Arial,sans-serif;mso-fareast-language:EN-GB">..</span>
<span
style="font-family:'Arial',sans-serif;mso-fareast-language:EN-GB">..</span>
<span
style="font-family:Arial,'sans-serif';mso-fareast-language:EN-GB">..</span>
<span
style="font-family:'Arial','sans-serif';mso-fareast-language:EN-GB">..</span>
<span style='font-family:Arial,Verdana;mso-fareast-language:EN-GB'>..</span>
<span style='font-family:"Arial",Verdana;mso-fareast-language:EN-GB'>..</span>
<span style='font-family:Arial,"Verdana";mso-fareast-language:EN-GB'>..</span>
<span style='font-family:"Arial","Verdana";mso-fareast-language:EN-GB'>..</span>
<span style='font-family:serif,sans-serif;mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:"serif",sans-serif;mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:serif,"sans-serif";mso-fareast-language:EN-GB'>..</span>
<span
style='font-family:"serif","sans-serif";mso-fareast-language:EN-GB'>..</span>
The output is:
<span style="font-family:"Arial",sans-serif">..</span>
<span style="font-family:"Arial",sans-serif">..</span>
<span style="font-family:"Arial"">..</span>
<span style="font-family:"Arial"">..</span>
<span style="font-family:"Arial",sans-serif">..</span>
<span style="font-family:"Arial",sans-serif">..</span>
<span style="font-family:"Arial"">..</span>
<span style="font-family:"Arial"">..</span>
<span style="font-family:"Arial","Verdana"">..</span>
<span style="font-family:"Arial","Verdana"">..</span>
<span style="font-family:"Arial","Verdana"">..</span>
<span style="font-family:"Arial","Verdana"">..</span>
<span style="font-family:serif,sans-serif">..</span>
<span style="font-family:"serif",sans-serif">..</span>
<span style="font-family:serif">..</span>
<span style="font-family:"serif"">..</span>
What is the expected output? What do you see instead?
I expected to see all the fonts listed.
What version of the product are you using? On what operating system?
r164, Java-1.6, JUnit.
Thank you!
Fred
Original issue reported on code.google.com by [email protected]
on 11 May 2013 at 8:08
A policy that uses a permissive attribute policy because there is an element policy that
can be confused.
We should prevent attributes with duplicate names from making it to an element policy to prevent element policy authors from being confused. The DOM model for element already assumes that there is at most one value for any given (namespace/local-name) pair so we lose no generality by restricting the output to have at most one attribute with a given name.
To ease migration it would be useful if java-html-sanitizer could import AntiSamy XML policy files.
Johannes Lichtenberger writes
I have the following policy:
/**
* Allow media elements/attributes.
*/
public static final PolicyFactory MEDIA = new HtmlPolicyBuilder().allowElements("video", "audio", "source")
.allowAttributes("controls", "width", "height").onElements("video").allowAttributes("controls")
.onElements("audio").allowAttributes("src", "type").onElements("source").allowTextIn("video", "audio")
.toFactory();
and the HTML content I want to sanitize (all whitelisted content) is:
<p><video controls="controls" width="300" height="150">
<source src="media/video/small.webm" type="video/webm" />
<source src="media/video/small.mp4" type="video/mp4" />
<source src="media/video/small.ogv" type="video/ogg" />
<source src="media/video/small.3gp" type="video/3gp" />
Your browser does not support the video tag.</video></p>
But it seems character content within the video-element is never permitted (contents-member field is 0, probably it should be != 0?). Should be valid to have an alternative text I guess.
Per
https://groups.google.com/d/topic/owasp-java-html-sanitizer-support/LJFuNLa4T_8/
discussion
<ul>
<li>asdf</li>
<ul>
<li>adfasdf</li>
</ul>
</ul>
is getting sanitized into:
<ul>
<li>asdf</li>
</ul>
<ul>
<li>adfasdf</li>
</ul>
instead of what Jon Steven's expects:
<ul>
<li>asdf</li>
<li>
<ul>
<li>adfasdf</li>
</ul>
</li>
</ul>
Jim points out that the input is misnested and
Line 5, Column 6: document type does not allow element "UL" here; assuming
missing "LI" start-tag
The tag balancer does not insert the missing LI start-tag.
Original issue reported on code.google.com by [email protected]
on 23 Oct 2012 at 3:34
Once a: protocol is allowed, policy authors often want to place additional
constraints: e.g. a data protocol with an image/... mime-type for use with <img
src>, or a tel: protocol that contains a valid telephone number.
Right now, policy authors are tempted to do
allowUrlProtocols("data", "https", "http", "mailto")
allowAttributes("src").matching(Pattern.compile("^(data:image/(gif|png|jpeg)[,;]
|http|https|mailto|//)", Pattern.CASE_INSENSITIVE)
which requires duplicative effort.
We should provide good alternatives to writing regular expressions to match
URLs as it is error prone.
Perhaps a URL policy that recognizes structure in URLs.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2014 at 4:09
[deleted issue]
> What steps will reproduce the problem?
Execute the attached testcase
> What is the expected output? What do you see instead?
When sanitizing, the sanitizer moves inner elements out of it's parent under
certain circumstances (see examples in testcase).
I don't want the sanitizer to change the markup but to remove all contents that
are not allowed.
> What version of the product are you using? On what operating system?
r135 / linux
Original issue reported on code.google.com by [email protected]
on 1 Feb 2013 at 12:10
Attachments:
robinhouston reports
"""
a NUL byte anywhere in the submitted code will cause the output to be blank.
"""
Original issue reported on code.google.com by [email protected]
on 10 Oct 2011 at 11:41
The version released yesterday 20151202.1 has a new class AutoCloseableHtmlStreamRenderer that implements java.lang.AutoCloseable interface define on org/owasp/html/HtmlStreamRenderer.java. This change causes that the library depends on runtime on JRE 1.7 or greater throwing a ClassDefNotFound exception is used with lower versions.
Is this a mistake or form this version the library will require JRE 1.7?
vytah said
"""
OK, I didn't circumvent the protection, but I managed to crash Firefox 8 and
make it unusable until I restarted it in safe mode.
My input was about 20000×<div> (opening only, no closing)
"""
Original issue reported on code.google.com by [email protected]
on 10 Oct 2011 at 9:20
Version: r239
Description: em-Tag is missing in allowCommonInlineFormattingElements()
if the strong-Tag is in allowCommonInlineFormattingElements(), em-Tag should be
also included.
Original issue reported on code.google.com by [email protected]
on 1 Dec 2014 at 2:02
> I'm trying to sanitze the html generated by a WYSWYG editor
> (http://hackerwins.github.io/summernote/), but the sanitize() is cleaning
> all the html tags. I'm doing this:
>
> PolicyFactory sanitizer =
>
Sanitizers.FORMATTING.and(Sanitizers.BLOCKS.and(Sanitizers.STYLES.and(Sanitizers
.LINKS)))
> sanitizer.sanitize(unsafeHtml)
>
> Source string:
> "<span style="font-weight: bold; text-decoration: underline;
> background-color: yellow;">aaaaaaaaaaaaaaaaaaaaaaa</span>"
>
> Result:
> aaaaaaaaaaaaaaaaaaaaaaa
>
> I'm doing something wrong? For what i've read, the standard sanitizers
> should be enough in this case
This looks like a bug. Sanitizers.STYLES doesn't work as advertised,
so the style="..." attribute is rejected out of hand, and <span> is
one of the elements that is, by default, stripped when it has no
attributes.
I'm looking into a fix and will respond to this thread when I know more.
I repeated the problem using:
PolicyFactory sanitizer = Sanitizers.FORMATTING
.and(Sanitizers.BLOCKS)
.and(Sanitizers.STYLES)
.and(Sanitizers.LINKS);
String input = "<span style=\"font-weight: bold;"
+ " text-decoration: underline; background-color: yellow;\""
+ ">aaaaaaaaaaaaaaaaaaaaaaa</span>";
String got = sanitizer.sanitize(input);
String want = input;
assertEquals(want, got);
Original issue reported on code.google.com by [email protected]
on 30 Apr 2014 at 7:04
Please, move project to Github. Project Hosting on Google Code will close on
January 25th, 2016.
Original issue reported on code.google.com by [email protected]
on 12 Apr 2015 at 12:31
I would like to allow users to provide HTML code such as:
<a data-target="..." data-action="..." data-profile="...">...</a>
Basically, any number of attributes on a tag (a
in this example) that begin with data-
. Is it possible do this with HtmlPolicyBuilder
(or otherwise)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.