Comments (2)
Hi! Thanks for asking this question.
This code snippet uses #scrub_fragment
which does two things:
- parse the fragment into a
Nokogiri::HTML::DocumentFragment
- sanitize that
DocumentFragment
Let's separate these two operations to see what's going on ...
Loofah.fragment("<hello message").children
# => [#<Nokogiri::XML::Element:0x2bc name="hello" attributes=[#<Nokogiri::XML::Attr:0x2d0 name="message">]>]
Interesting: Nokogiri parses that fragment into a <hello></hello>
element. Why is that? Nokogiri (actually, libxml2) treats this as a "markup error" and tries to fix it:
Loofah.fragment("<hello message").errors
# =>
# [#<Nokogiri::XML::SyntaxError: 1:27: ERROR: Tag hello invalid>,
# #<Nokogiri::XML::SyntaxError: 1:27: ERROR: Couldn't find end of Start Tag hello>]
If your intention is to have this string interpreted as a "text node" that equals <hello message
you should be aware that a bare <
in an HTML text node is considered malformed, and you should use <
instead. You may want to consider HTML-escaping anything that's a text node before passing it to Loofah:
CGI.escapeHTML("<hello message")
# => "<hello message"
Loofah.fragment(CGI.escapeHTML("<hello message"))
# => #(DocumentFragment:0x3d4 { name = "#document-fragment", children = [ #(Text "<hello message")] })
Loofah.fragment(CGI.escapeHTML("<hello message")).to_html
# => "<hello message"
The <hello>
element is being removed by the Strip
scrubber. The documentation says:
+:strip+ removes unknown/unsafe tags
Is <hello></hello>
a known and safe tag? Let's look at the code:
loofah/lib/loofah/scrubbers.rb
Lines 96 to 102 in 369a54f
which calls html5lib_sanitize
:
Lines 103 to 106 in 369a54f
which calls allowed_element?
:
loofah/lib/loofah/html5/scrub.rb
Lines 16 to 18 in 369a54f
Which uses ALLOWED_ELEMENTS_WITH_LIBXML2
-- basically this allowlist which hello
is not a member of:
loofah/lib/loofah/html5/safelist.rb
Lines 49 to 144 in 369a54f
If we use something in the list instead, like audio
, we see Loofah keeps it around:
Loofah.fragment("<audio message").scrub!(:strip).to_html
# => "<audio></audio>"
I hope that makes sense!
from loofah.
Thank you @flavorjones for an amazing explanation of underlying code.
from loofah.
Related Issues (20)
- A whitespace handling change in v2.9.0 is breaking a test in our code HOT 1
- `#text` should only render HTML elements HOT 1
- explore testing with the portswigger xss cheat sheet exploits
- `#to_text` doesn't handle `<br>` elements well. HOT 4
- Adding sms to ACCEPTABLE_PROTOCOLS HOT 3
- tests fail with latest versions of dependencies HOT 1
- Loofah removes HOT 3
- HTML5 empty attributes are being scrubbed HOT 5
- CSS Scrubber is removing the builtin extended CSS color properties in `>= v2.9.0` HOT 5
- RFC: should Loofah sanitize `<style>` tag contents HOT 2
- Preserving emails that look like tags HOT 2
- loofah issue with recent CVE release HOT 2
- unclosed html tags are also being pruned off, ideal expectation is to have only closed tags pruned HOT 12
- Getting errors using Nokogiri < 1.12 HOT 11
- pass encode_special_chars to to_s HOT 1
- Whitespace Added around "/" in CSS HOT 3
- Add scrub to append `target=_blank` to all links HOT 3
- Built-in scrubbers don't escape unsafe HTML with Nokogiri > 1.15 HOT 2
- feat: encapsulate some whitespace-handling into a scrubber (or scrubbers) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from loofah.