Comments (8)
This is the expected output, but obviously, if the font tag is shuttled into
some non-row content abyss, then it could be improved.
The design decision was made that way because CSS is such a large attack
surface that anything we can do to limit attackers' ability to manipulate CSS
leaves us in a more secure position.
Are there tags besides HTML where this is a problem?
If the styling handler detected that it was inside a table, thead, tbody,
tfoot, rowgroup, colgroup, or tr element, and instead inserted the <font>
element inside contained <td> and <th> elements would that help?
So instead of
<table>
<font face="Arial, Geneva, sans-serif" style="color:#000">
<tbody>
<tr>
<th>Column One</th>
<th>Column Two</th>
</tr>
<tr>
<td align="center"><font style="background-color:#fffffe"><font size="2">Size
2</font></font></td>
<td align="center"><font style="background-color:#fffffe"><font size="7">Size
7</font></font></td>
</tr>
</tbody>
</font>
</table>
you would get
<table style="color:#000">
<tbody>
<tr>
<th><font face="Arial, Geneva, sans-serif">Column One</th>
<th><font face="Arial, Geneva, sans-serif">Column Two</th>
</tr>
<tr>
<td align="center"><font style="background-color:#fffffe" face="Arial, Geneva,
sans-serif" size="2">Size 2</font></td>
<td align="center"><font style="background-color:#fffffe" face="Arial, Geneva,
sans-serif" size="7">Size 7</font></td>
</tr>
</tbody>
</table>
Original comment by [email protected]
on 2 Feb 2013 at 6:25
from java-html-sanitizer.
White-listing a set of fonts from
http://webdesign.about.com/od/fonts/qt/web-safe-fonts.htm could keep the size
of the sanitized output much closer to the size of the input for large tables,
but I'm loathe to do anything that makes some fonts work with tables and others
not.
Original comment by [email protected]
on 2 Feb 2013 at 11:23
from java-html-sanitizer.
I'm actually not that much of an HTML expert; this is just the problem I saw in
my first test. I assume there will be other constructs that are similarly
problematic, but I'm not in a position to enumerate them.
I still think you could preserve some style attributes without increasing the
attack surface. Currently when you see
style="color: rgb(0, 0, 0); font-family: Arial, Geneva, sans-serif;"
you parse that into some data structures that are subsequently used in a font
tag. I'm suggesting that instead of emitting the font tag, you could remove the
user-supplied style attributes, and replace it with a new style attribute that
you generate from known constructs. So in my example, you would get:
<table style="color:#000; font-family: Arial, Geneva, sans-serif;">
I'm concerned that any other approach is likely to impact rendering in ways
that aren't obvious to end users. It might not be a big deal if someone is
typing markup into a wiki, but if your app allows someone to paste in a chunk
of HTML (copied from a web page, or perhaps even a word processing
application), then users will expect the rendering to look as similar to the
original as possible.
Original comment by [email protected]
on 4 Feb 2013 at 8:08
from java-html-sanitizer.
> if your app allows someone to paste in a chunk of HTML (copied from a web
page, or perhaps even a word processing application), then users will expect
the rendering to look as similar to the original as possible.
agreed
> instead of emitting the font tag, you could remove the user-supplied style
attributes, and replace it with a new style attribute that you generate from
known constructs
I do something like that already, just not for font names.
http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/tests
/org/owasp/html/StylingPolicyTest.java
http://code.google.com/p/owasp-java-html-sanitizer/source/browse/trunk/src/main/
org/owasp/html/StylingPolicy.java#320
Only font family, align, and style are put on the <font> tag. The latter two
are easy to whitelist.
http://www.w3.org/TR/CSS21/fonts.html#value-def-family-name says
> Font family names must either be given quoted as strings, or unquoted as a
sequence of one or more identifiers. This means most punctuation characters and
digits at the start of each token must be escaped in unquoted font family names.
I'll see if I can come up with a white-list of generic font names (e.g.
sans-serif), and then any non-generic font name that contains only ASCII alpha
numerics and spaces gets quoted and put in a CSS style tag. Anything with
punctuation like the examples below from the CSS spec I'll either reject or
maybe shove in a <font face>.
font-family: Ahem!, sans-serif;
font-family: test@foo, sans-serif;
font-family: #POUND, sans-serif;
font-family: Hawaii 5-0, sans-serif;
I'll test whether vendor prefixed ones like -webkit-small-control survive
quoting. Allowing untrusted code to spoof OS controls might enable trusted
path violation anyway.
Original comment by [email protected]
on 5 Feb 2013 at 10:37
- Changed state: Accepted
from java-html-sanitizer.
http://code.google.com/p/owasp-java-html-sanitizer/source/detail?r=147 fixes
this issue. Let me know if that works for you and I'll cut a push to maven
central.
Original comment by [email protected]
on 12 Feb 2013 at 7:14
from java-html-sanitizer.
Thanks Mike, this does in fact fix the specific problem I reported.
(I'm still concerned that the fidelity of the transformed HTML will be
insufficient unless a lot more CSS constructs are supported, though.)
Original comment by [email protected]
on 2 Apr 2013 at 10:26
from java-html-sanitizer.
Great. I'll make sure this is on maven current.
Re CSS, my current (full-time) project involves generating sanitizers (and
other tools) from grammars annotated with schema constraints so hopefully
https://code.google.com/p/noinject/source/browse/mlsrc/test-files/san/css/gramma
r.g will soon serve as the basis for a more flexible way to sanitize CSS. That
grammar is very drafty and written against an obsolete version of the CSS3
spec, but the general shape will probably remain the same.
Original comment by [email protected]
on 5 Apr 2013 at 4:26
from java-html-sanitizer.
r198 includes a significant rewrite of the CSS sanitizer which recognizes a
larger set of CSS properties and no longer introduces <font> elements so should
work well with tables.
Original comment by [email protected]
on 24 Jul 2013 at 3:55
- Changed state: Fixed
from java-html-sanitizer.
Related Issues (20)
- xxx-large font-size is discarded when allowStyling() is used HOT 6
- Issue while disallowing attributes matching pattern
- Remove malicious code from svg content HOT 1
- Encoding malicious code instead of removing it HOT 4
- Index out of bound when empty list is passed to `allowAttributes(...).globally()`
- Guava removal breaks compatibility (with JDK9) HOT 13
- Html sanitizer repeatedly adds rel="noopener noreferrer" even if it's pre-exist HOT 1
- SECURITY.MD currently does not contain sensible information
- Sanitizing CSS HOT 3
- ClassNotFoundException: org.owasp.shim.Java8Shim after update to 20240325.1 HOT 5
- Release 20240325 cannot be transpiled HOT 1
- Issue in 2024x version with styles
- Question: What means Recognize foreign content syntactic context: mathml / svg?
- Issues encountered while processing <a> tags
- rel attributes are reordered in 20220608.1
- Possible to enforce having mutliple attributes on tag?
- On Java8Shim class, better to catch Throwable instead of Error
- text-align literals are outdated
- Please build the Java8/10 shim classes into the sanitizer JAR
- Issue with HTML Sanitization: Improper Handling of <div> Tag Inside <table>
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from java-html-sanitizer.