Comments (5)
For the cloudflare_worker, we might be able to use HTMLRewriter in order to avoid growing the bundle size. It is supposed to be efficient, based on its underlying library LOL HTML. (Not sure; just an idea.)
from sxg-rs.
I'm not going to finish this before my vacation, so I'm saving state:
#68 is worse than I'd thought. The segfault happens even on HtmlRewriter::new()
. So I think we have a few options (rough order of preference):
- Duplicate the HTML code. In cloudflare_worker, use the JS HtmlRewriter API. In fastly_compute, use the Rust HtmlRewriter. Is the duplication reasonably small/manageable?
- Fix #68. Are the install instructions still reasonable?
- See if we can use another HTML parser like html5ever or (via FFI) LazyHTML. Does this address the segfault? Is the bundle size & performance OK?
- Some hacky regex for parsing the link tags, and validate the hrefs & params look OK (e.g. don't break the browser's Link header parser). Leave a comment in the README that things that look like tags inside script, noscript, template, or maybe foreign contexts like svg/html might cause a spurious prefetch.
As for what the HTML parser does, it converts <link rel=preload as=foo>
tags into Link
header directives. It should also add the parameters crossorigin
, media
, imagesrcset
, and imagesizes
if specified.
We have a choice, we can either:
- Mandate the document is UTF-8 and verify it has a
<meta>
tag that declares as such, and doesn't have a UTF-16 BOM. - Try and detect the character encoding the way that Chromium does (as the only SXG browser engine right now, and the likely portrait of future browser behavior per whatwg/html#6962).
Link dump:
- HTMLMetaCharsetParser::CheckForMetaCharset
- Encoding sniffing spec that roughly matches what Gecko does, but not Blink or WebKit.
- EncodingFromMetaAttributes: I think we should port this directly whether we mandate UTF-8 or do full chardet. https://encoding.spec.whatwg.org/#concept-encoding-get contains the strings to look for; remember to ASCII-lowercase and strip HTML whitespace.
- For sniffing from byte patterns: https://github.com/hsivonen/chardetng (Gecko, Rust) or https://github.com/google/compact_enc_det (Chromium, C++). The Rust lib may differ slightly from Chromium behavior, but I think the edge case is rare enough (misdetected character encoding that still produces a valid but incorrect link tag) and the impact is small enough (prefetching an unneeded same-origin URL into the cache) that this is fine.
Possible test set dump:
- https://github.com/html5lib/html5lib-tests/tree/master/encoding
- https://github.com/web-platform-tests/wpt/tree/master/encoding-detection
- https://github.com/web-platform-tests/wpt/tree/master/html/syntax/parsing
Advice dump:
- Remember
rel
can contain multiple (whitespace-separated). - BOM sniffing overrides
<meta>
so we must check either way.
from sxg-rs.
Remaining task is to implement on fastly_compute.
from sxg-rs.
Oh, also remaining is to process additional attributes besides {rel,href,as}
. Here's link's supported attributes.
For certain, we should support imagesrcset
and imagesizes
, as I know these are supported by SXG subresources. I'm not sure about crossorigin
, and referrerpolicy
; we should see what Chromium (and/or the spec) does. I'm pretty sure media
is not supported because a document frame isn't constructed by a prefetch event. The others don't seem relevant.
from sxg-rs.
Implementing for fastly_compute should be easy now that process_html.rs exists.
from sxg-rs.
Related Issues (20)
- Add signed_host config param
- Lower SXG q-value for outgoing Accept header
- Flaky ACME integration test HOT 1
- Header integrity fetcher should not prefer SXG for subresources
- Current implementation will strip headers with same key HOT 2
- Content-Length header after transformation
- Create a reverse proxy server HOT 11
- Add AWS Lambda binding
- Add Netlify binding
- Add Vercel binding HOT 2
- Add Google Cloud Run binding
- Add Akamai binding HOT 11
- Remove wasm feature
- Switch Fastly ACME state to use Object Store
- Fastly: proxy unsigned on error
- Add backend_host config param
- Error out if `via: sxgrs`
- Add User-Agent sniffing HOT 6
- Set outer cache-control header
- Enable conditional responses HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sxg-rs.