Comments (6)
Use the default policy p := bluemonday.UGCPolicy()
and after you have sanitised the code, the structure and form of the HTML should be consistent.
If I was being quick and dirty, that would be where I would consider reading the document as a string and checking the prefix and suffix of the string for the outer HTML tags and then replacing them with nothing.
However, that's not recommended as ideally you shouldn't do any processing after the sanitisation has been performed, and in my own code I essentially pre-process input which leads to a consistent structure before sanitisation, and at that point I strip that out and sanitise just the fragment that is the body:
https://github.com/microcosm-cc/microcosm/blob/master/models/markdown.go#L80-L88
const htmlCruft = `<html><head></head><body>`
// The treewalking leaves behind a stub root node
if bytes.HasPrefix(src, []byte(htmlCruft)) {
src = src[len([]byte(htmlCruft)):]
}
// Scrub the generated HTML of anything nasty
// NOTE: This *MUST* always be the last thing to avoid introducing a
// security vulnerability
src = SanitiseHTML(src)
Where SanitiseHTML
is just my wrapper for bluemonday:
https://github.com/microcosm-cc/microcosm/blob/master/models/sanitise.go#L34-L45
var (
textPolicy = bluemonday.StripTagsPolicy()
htmlPolicy = bluemonday.UGCPolicy()
initHTMLPolicy bool
)
// SanitiseHTML sanitizes HTML
// Leaving a safe set of HTML intact that is not going to pose an XSS risk
func SanitiseHTML(b []byte) []byte {
if !initHTMLPolicy {
htmlPolicy.RequireNoFollowOnLinks(false)
htmlPolicy.RequireNoFollowOnFullyQualifiedLinks(true)
htmlPolicy.AddTargetBlankToFullyQualifiedLinks(true)
initHTMLPolicy = true
}
return htmlPolicy.SanitizeBytes(b)
}
Oh, and if you're wondering what my pre-processing was, I linkify all @ and + mentions of other users, which required building and modifying a HTML document and that has a side effect of both balancing the HTML tree as well as to produce a consistent output: https://github.com/microcosm-cc/microcosm/blob/master/models/mentions.go#L55
Answer: with pre-processing and a consistent structure it's very safe and easy to just string process it out before you sanitise, but it is also possible with post-processing though that isn't recommended.
from bluemonday.
@buro9 thanks for your help!
btw: how would you turn this code into a single standalone executable where i could pass the allowed attributes / tags via args?
from bluemonday.
You could encapsulate everything that is a policy as a JSON file and treat that as a configuration to be loaded by a flag, and then construct the policy and execute it against either stdin or a file input.
from bluemonday.
can you help me a little? especially how would you map/assign the json vars to a policy
right now i am passing two comma separated lists as args.
This is where i am right now (using https://github.com/alecthomas/kingpin):
var (
tags = kingpin.Arg("tags", "Comma separated tags/elements to allow").String()
attributes = kingpin.Arg("attributes", "Comma separated tag-attributes to allow").String()
)
func main() {
kingpin.Parse()
tags := strings.Split(*tags, ",")
attributes := strings.Split(*attributes, ",")
p := bluemonday.UGCPolicy()
p.AllowAttrs(attributes...).Globally()
p.AllowElements(tags...)
from bluemonday.
That is not what I would do. The args would be massive.
I'd have a single arg, that was the path to a JSON file.
The JSON file should be structured such that you can loop through and construct the policy.
That's it.
from bluemonday.
okay sounds good, will follow that approach.
given this html:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Demystifying Email Design</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
</html>
how would you allow head/meta ?
right now i am trying this, but meta is always removed:
tags := []string{"html", "head", "meta", "title", "body"}
attributes := []string{"xmlns" }
p := bluemonday.NewPolicy()
p.AllowDocType(true)
p.AllowNoAttrs()
p.AllowAttrs(attributes...).Globally()
p.AllowElements(tags...)
and result:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
Demystifying Email Design
</head>
</html>
from bluemonday.
Related Issues (20)
- How to disallow emoji? HOT 1
- Go ParseThru vulnerability HOT 2
- Test case not sanitising HOT 1
- Paragraph sanitization (e.g. img.alt) is too restrictive, disallows punctuation
- Sanitize only what is disallowed HOT 1
- Way to skip html escaping code blocks? HOT 1
- Can't allow `<picture>` and `<source>` HOT 1
- Add url prefix for tags such as `a`, `img` and `iframe` HOT 3
- Error when using & and amp in url
- Strip only single attribute HOT 3
- Trailing spaces in style attributes break sanitizing
- Is there a way to allow all URL schemes? HOT 3
- Sanitization removes spacing HOT 1
- How to retain URL? HOT 1
- Option to add spaces HOT 2
- SVG policy HOT 1
- <a> tags in tables not matched correctly HOT 1
- New maintainers for bluemonday in 2024 HOT 1
- Filter multiple class values through whitelist
- multiple matching global matchers can cause duplicated attributes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bluemonday.