google / sxg-rs Goto Github PK
View Code? Open in Web Editor NEWA set of tools for generating signed exchanges at serve time.
License: Apache License 2.0
A set of tools for generating signed exchanges at serve time.
License: Apache License 2.0
The current implementation does not update content-length
header after performing SXG transformation (i.e. calling process_html). Should we either set the header to correct value or just remove it?
To minimize conflict with the origin's URL namespace, the cloudflare_worker could only respond to reserved_path URLs when requested on the workers.dev domain. Does fastly_compute have a similar special-purpose domain? If not, this could be an optional behavior.
Per this TODO:
As a performance optimization, maybe start with a
Content-Length
sized buffer and resize exponentially if necessary. Alternatively, use the limitBytes() transformer in streamFrom, and construct a flyweight Response object here in order to call arrayBuffer().
The code in WebAssembly uses a global variable to to store SxgWorker
.
Sometimes developers need to create handle multiple domains, and need multiple instances of SxgWorker
.
We can make a change to export the constructor SxgWorker::new
to WebAssembly.
In addition to checking for private
or no-store
, the worker should back off if no-cache
or max-age=0
, because these directives alone don't provide the guarantee that the document is OK to be cached and reused. (no-cache
without 304s is equivalent to no-store
.)
For instance, links with an unescaped |
in the target URL are invalid per rfc8288, and don't meet Google SXG Cache requirements. There may be other cases like that.
The code should be formatted by rustfmt. A GitHub action should be added to check the format.
Update forward_to_origin_server
to return an Err
if the Authorization
request header is present.
#57 limited its solution of #13 to same-origin preloads for efficiency's sake:
To address this problem, we could mitigate the cost, by caching a per-origin boolean saying whether it supports SXG. Say, 1h expiry. On cache expiry/miss, process the link as if it supports SXG and update the cache accordingly.
Opposite to #87, we could enable a tag annotation like:
<div data-not-in-sxg>...</div>
which deletes the tag and its descendants when rendered in an SXG.
Create a reverse proxy server in Rust, similar to Web Packager Server as a wrapper around the sxg_rs
library. This could be run as a typical server or as a service on Google Cloud Run (see docs).
The worker should process the Link
header before signing, in order to make it compatible with Google SXG Cache requirements. In particular, it should:
preload
and allowed-alt-sxg
.allowed-alt-sxg
(DONE in #57 and #61)We want to compute allowed-alt-sxg because authors won't have done this already. It is necessary to support prefetching subresources from webpkgcache.com by way of subresource substittution.
For each preload, the worker needs to:
header-integrity
cache in the KV store.header-integrity
per this definition and store it in the cache.
cache-control
header.The KV store minimizes the # of backend fetches caused by this feature.
header-integrity
Split off to #26. DONE in #36.
The worker should also eliminate frequently changing response headers that don't affect the semantics of the SXG (e.g. Date
). Resources for researching which headers to eliminate:
- https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_fields
- https://datatracker.ietf.org/doc/html/rfc7230 and friends
Then the sentence at the end of #4 can be reverted.
In forward_to_origin_server
, modify the Accept header so that application/signed-exchange;v=b3
has a lower q score than text/html
, as recommended for requests not preferring SXG.
This helps for services configured in a loopback mode, like browser -> frontend -> sxg-rs -> frontend. This is only an issue for fastly_compute right now, because Cloudflare Workers doesn't run on loopback requests.
Currently they are loaded only once at init (cloudflare_worker, fastly_compute). The SxgWorker API should allow the PEMs to change, but also still allow caching between requests (shouldn't need to re-parse the same PEM on every request).
Headers
looks at multiple cache-control headers in order to determine when to abandon signing but not signature duration. We should allow these CDN-specific cache-control headers to override cache-control
max-age as specified.
Specifications for CDN-Cache-Control
and Surrogate-Control
.
Before building the sxg, verify that signature
and signed_headers
are <= their corresponding max lengths, so that the resulting SXG can be parsed.
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
anyhow
, async-trait
, base64
, byte-strings
, chrono
, ciborium
, clap
, ctrlc
, der-parser
, fastly
, form_urlencoded
, futures
, getrandom
, http
, hyper
, hyper-rustls
, hyper-tls
, js-sys
, log
, log-fastly
, lol_html
, lru
, nom
, once_cell
, p256
, pem
, percent-encoding
, regex
, rustls
, rustls-pemfile
, serde
, serde-wasm-bindgen
, serde_json
, serde_yaml
, sha1
, sha2
, thiserror
, tokio-rustls
, tokio-test
, toml
, url
, warp
, wasm-bindgen
, wasm-bindgen-futures
, web-sys
, wrangler
, x509-parser
)@cloudflare/workers-types
, @types/jasmine
, @types/node
, @types/node-fetch
, commander
, dompurify
, esbuild
, fastify
, glob
, jasmine-core
, karma
, karma-chrome-launcher
, node-fetch
, tslib
, typescript
)actions/cache
, actions/checkout
, actions/setup-go
, actions/setup-node
)@cloudflare/workers-types
, @types/dompurify
, @types/jasmine
, @types/jsdom
, @types/node
, commander
, dompurify
, glob
, gts
, jasmine-core
, jsdom
, puppeteer
, typescript
)clap
, der-parser
, http
, hyper
, lol_html
, pem
, rustls-pemfile
)cloudflare_worker/Cargo.toml
console_error_panic_hook 0.1.7
wasm-bindgen 0.2.83
distributor/Cargo.toml
anyhow 1.0.66
base64 0.13.1
byte-strings 0.2.2
ciborium 0.2.0
clap 3.2.23
form_urlencoded 1.1.0
futures 0.3.25
http 0.2.8
hyper-rustls 0.23.2
hyper-trust-dns 0.5.0
hyper 0.14.23
lazy_static 1.4.0
nom 7.1.1
percent-encoding 2.2.0
regex 1.7.0
rustls 0.20.7
rustls-pemfile 1.0.1
sha2 0.10.6
thiserror 1.0.37
tls-listener 0.5.1
tokio 1.23.0
tokio-rustls 0.23.4
url 2.3.1
fastly_compute/Cargo.toml
anyhow 1.0.66
async-trait 0.1.59
base64 0.13.1
fastly ^0.8.9
http 0.2.8
log 0.4.17
log-fastly 0.8.9
pem 1.1.0
serde 1.0.149
serde_yaml 0.9.14
tokio 1.23.0
url 2.3.1
http_server/Cargo.toml
anyhow 1.0.66
async-trait 0.1.59
clap 3.2.23
fs2 0.4.3
futures 0.3.25
http 0.2.8
hyper-rustls 0.23.2
hyper-tls 0.5.0
hyper-trust-dns 0.5.0
hyper 0.14.23
lazy_static 1.4.0
lru 0.8.1
rand 0.8.5
serde_yaml 0.9.14
tokio 1.23.0
url 2.3.1
assert_matches 1.5.0
sxg_rs/Cargo.toml
anyhow 1.0.66
async-trait 0.1.59
base64 0.13.1
chrono 0.4.23
der-parser 8.1.0
futures 0.3.25
getrandom 0.2.8
http 0.2.8
js-sys 0.3.60
lol_html 0.3.1
nom 7.1.1
once_cell 1.16.0
pem 1.1.0
p256 0.11.1
serde 1.0.149
serde-wasm-bindgen 0.4.5
serde_json 1.0.89
serde_yaml 0.9.14
sha1 0.10.5
sha2 0.10.6
tokio 1.23.0
url 2.3.1
wasm-bindgen 0.2.83
wasm-bindgen-futures 0.4.33
web-sys 0.3.60
x509-parser 0.14.0
tokio-test 0.4.2
tools/Cargo.toml
anyhow 1.0.66
async-trait 0.1.59
base64 0.13.1
clap 3.2.23
ctrlc 3.2.3
der-parser 7.0.0
http 0.2.8
hyper 0.14.23
hyper-tls 0.5.0
pem 1.1.0
regex 1.7.0
serde 1.0.149
serde_json 1.0.89
serde_yaml 0.9.14
toml 0.5.9
tokio 1.23.0
url 2.3.1
warp 0.3.3
wrangler 1.19.13
.github/workflows/code-style.yml
actions/checkout v3
actions/cache v3
actions-rs/cargo v1
actions-rs/cargo v1
actions-rs/cargo v1
actions/checkout v3
actions/setup-node v3
.github/workflows/integration-tests.yml
actions/checkout v3
actions/checkout v3
actions-rs/toolchain v1
actions/checkout v3
actions/setup-go v3
actions-rs/toolchain v1
actions/checkout v3
actions/cache v3
.github/workflows/unit-tests.yml
actions/checkout v3
actions/cache v3
actions-rs/toolchain v1
actions-rs/cargo v1
actions-rs/cargo v1
actions/checkout v3
actions/setup-node v3
cloudflare_worker/worker/package.json
@cloudflare/workers-types 3.18.0
@types/node 16.18.11
esbuild 0.16.14
glob 8.0.3
gts 3.1.1
tslib 2.4.1
typescript 4.9.4
node >=16.0.0
playground/package.json
commander 9.4.1
dompurify 2.4.2
fastify 4.11.0
jsdom 20.0.3
node-fetch 3.3.0
puppeteer 17.1.3
@types/dompurify 2.4.0
@types/jsdom 20.0.1
@types/node 17.0.18
@types/node-fetch 2.6.2
esbuild 0.16.14
gts 3.1.1
tslib 2.4.1
typescript 4.9.4
node >=16.0.0
typescript_utilities/package.json
@types/jasmine 4.3.1
esbuild 0.16.14
gts 3.1.1
jasmine-core 4.5.0
karma 6.4.1
karma-chrome-launcher 3.1.1
karma-jasmine 5.1.0
PromoteLinkTagsToHeaders
requires non-empty as=
. However, an empty destination is useful e.g. for preloading XHRs.
Add a wrapper around sxg_rs
to integrate with the AWS Lambda Rust runtime. We can use the AWS Rust SDK to integrate with other services e.g. storage for OCSP/certs.
signature_duration looks at cache-control
only. Look at expires
too, per the freshness lifetime spec. This requires an HTTP-date parser.
Offer a way to opt individual link tags out of being converted into link headers, e.g. with a data-no-sxg-header
attribute.
If specified, it would replace the host for the fallbackURL being signed. Otherwise, the current logic would apply (replace with html_host, or else don't replace).
Unlike html_host, signed_host would not affect where the URL is being fetched from.
Some SxgWorker
functions take arguments of trait object Fetcher
, Signer
, (and HttpCache
in #61). However, the implementations of these traits usually do not change during the lifetime of a worker. Hence we can move these trait objects into the SxgWorker
member variables.
Changes would be like
pub struct SxgWorker {
..
+ runtime: Runtime,
}
+ pub struct Runtime {
+ signer: Mutex<dyn Signer>,
+ fetcher: Mutex<dyn Fetcher>,
+ cache: Mutex<dyn Cache>,
+ }
impl SxgWorker {
- pub async fn fetch_ocsp_from_ca<F: fetcher::Fetcher>(&self, fetcher: F) -> Vec<u8>;
+ pub async fn fetch_ocsp_from_ca(&self) -> Vec<u8>;
}
}
By default, the worker should strip uncached headers + set-cookie
and strict-transport-security
from the response if it's present. Reject if any other stateful headers are present. It should be safe to strip two stateful headers are common, and websites tend to be resilient to them not working (e.g. because cookie blockers are common).
(It's possible that websites tend to be resilient to other stateful headers not working, but I don't have data/intuition on that. Given they're comparatively rare, we can err on the safe side without much loss of utility.)
Present behavior (IIUC), is:
Options (in order of preference):
reject_stateful_headers
into strip_stateful_headers
and the value is a list (default ['set-cookie', 'strict-transport-security']
). Only if this is easy to do.reject_stateful_headers: false
, and document that it only covers set-cookie
and strict-transport-security
.reject_stateful_headers
into a tri-state (none, all, or recommended).In addition to checking cache-control
, the worker should check cdn-cache-control
, cloudflare-cdn-cache-control
, and surrogate-control
.
This more accurately captures whether the document is OK to be cached.
A rare error recently happened in ACME integration test, but the same test passes with re-run.
The reason of this flaky test might be the underlying MockFetcher.
Add configuration parameters similar to webpkgserver to support ACME renewal of a certificate. It's not necessary to support all verification methods (DNS/HTTP/ALPN); one is sufficient (whichever is automatable).
This should include some support for monitoring that the certificate is still valid. Here are some ideas:
curl | openssl
command the user could run as a cron job.The Headers struct uses HashMap to hold key-value pairs. However, headers can have same keys. For conforming headers, we can join them into one string, but it is not the case for all headers, especially when their values can contain comma(s). Thus, sxg-rs should use a multi-map (e.g. HeaderMap in http).
I can help replacing the current implementation if you think it makes sense.
Per the SXG spec and the HTTP/1.1 spec, the worker should parse the Connection
header and remove any headers it lists before signing.
This would go in get_signed_headers_bytes
.
The worker currently hardcodes the DigiCert responder URL.
The worker should add a config param for the responder URL, as recommended.
If possible, it should also pull the correct value automatically from the AuthorityInfoAccess
extension of the leaf certificate (golang example), overriding the config param.
Addresses two issues:
kSignedExchangeEnabledAcceptHeaderForPrefetch
is used for <link rel=(preload|prefetch)>
even outside of SXG context. This can interact with https://crbug.com/1180441 resulting in broken subresources for pages that use that tag and run this worker.
To avoid this, the worker should serve SXG if and only if its q-value is 1, until https://crbug.com/1243065 is fixed.
Introducing a dependency on some of the functions in encoding_rs causes a segfault in wasm-opt
before version 102. However, wrangler projects with type = "rust"
depend on wasm-pack 0.10.0 and wasm-pack 0.10.0 depends on wasm-opt 78.
Rather than waiting for both of these dependencies to update in sequence, we should consider switching to type = "javascript"
. Either:
TargetType::Rust
and reproduce those in toml, or--type rust
. This would require eliminating the TypeScript and using workers-rs instead, which might have downsides -- e.g. performance loss from DOM<->Rust bridge? Or no async_std/tokio.I learn towards the first. It seems easier and less risk, and doesn't preclude the doing second one later.
Currently cargo run -p config-generator
is an interactive process. This is inconvenient because the user have to re-input everything or re-press enter to confirm everything.
We need to create a new source-of-truth file, which is read by config-generator
to generate WranglerConfig
and SxgConfig
.
Both cloudflare_worker
and playground
are written by TypeScript. Their duplicate codes should be moved into one single folder.
For instance if the HTML contains:
<template data-sxg-only>
<script>log("visit-from-sxg=true")</script>
</template>
then when generating an SXG, rewrite it to:
<script>log("visit-from-sxg=true")</script>
Other possible spellings include <template class=sxg-only>
or <script type=text/sxg-only>
or <!--[if SXG]>
, but the above seems the most general and least likely to collide with existing pages. (But that should be confirmed before implementing.)
Enable web publishers to distinguish SXG visits from unsigned visits in their analytics. This could help them track overall usage, or compare performance metrics between the two cohorts. Examples of feature requests for a similar tool (CF ASX):
The compiled cloudflare_worker
wasm is currently around 1.2MB. The opt-level
and lto
tricks didn't reduce that. It's clearly possible to make small wasm binaries. Investigate the easiest change possible to make this one smaller.
The twiggy
command identified table[0]
as the main culprit. I think that might have to do with the use of JS callbacks? Judging by the MDN article and Lin Clark's article.
Things I didn't try:
format!
panic
no_main
no_std
Articles I didn't read:
The js_signer mod formats signature as asn1 in the rust code, because the current TypeScript implementation does not do the asn1 formatting.
However, there is a posibility that some JavaScript signer already includes the asn1 formatting. We should provide the flexibility to disable the asn1 formatting in the Rust code.
html_host
). Improves DX by removing the worker_host
option from config.yaml..well-known/sxg-certs/...
and register sxg-certs
with IANA by filing an issue on well-known-uris. Reduces chance of conflict with other content.cert
to the web-safe padded base64 encoding of the cert-sha256. Improves interaction with intermediary caches that don't have content-addressing.We are using both async-std
and tokio
in the same time. It is needed to select one.
Eliminate the TS warnings emitted by rollup
, either by rolling back #114 and setting a <3
version constraint on @cloudflare/workers-types
, or by fixing forward somehow (if easy).
Also, change unit-tests.yml to run rollup with --failAfterWarnings
so that this is caught by the CI in the future.
Perhaps this article has a clue on how to fix forward. Otherwise, it seems not to be a high-severity bug; the Cloudflare Workers runtime is supposed to be backwards-compatible so it's just a change in the .d.ts
files I guess.
Netlify supports serverless Rust. It appears from that blogpost that Netlify serverless functions run on AWS Lambda, so there may be significant overlap with #251 (e.g. in integrating w/ the KV API for certs/OCSP).
Most functions return Result<_, String>
. We should make a change to use an Error struct rather than a simple String
.
Example failure: https://github.com/google/sxg-rs/runs/4049740271?check_suite_focus=true
I narrowed this down to the second chunk > maxSize
test. Not sure if it's a test or prod bug yet.
Opposite to #139, we could create a new link relation like
<link rel=sxg-preload as=... href=...>
which is converted to a Link header for SXG subresource prefetching, but not used for normal HTML. It would be registered here per the HTML spec.
Preferably, it may be possible to do this already like:
<template data-sxg-only><link rel=preload as=... href...></template>
We should verify this adds an SXG preload header but does not preload otherwise.
The PromoteLinkTagsToHeaders processor doesn't skip template contents. It probably should skip template contents except when the template is data-sxg-only
.
For #13 to be effective, the header-integrity
value for a given URL should remain relatively stable. The worker should eliminate frequently changing response headers that don't affect the semantics of the SXG (e.g. Date
). Resources for researching which headers to eliminate:
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_fields
https://datatracker.ietf.org/doc/html/rfc7230 and friends
Rather than setting sxg lifetime to a fixed 6 days, the worker should set it to min(7 days, whatever origin cache headers say)
.
I think that's Cache-Control: s-maxage
, else Cache-Control: max-age
, else Expires
. But worth rereading the RFC to be sure.
As a follow-on to #13, we can also parse <link rel=preload>
tags in the HTML and convert them to Link headers. This makes the SXG subresource feature easier to use, as editing HTML is easier than setting custom HTTP headers.
If the attempt to fetch the header-integrity fails, don't add the preload.
#41 runs the unit tests for Rust & TypeScript. Add to this, something that tests that they interact correctly with each other and with the Cloudflare Worker environment. Ideas:
End-to-end tests:
wrangler dev
and curl localhost:8787 | dump-signedexchange -verify
Integration tests for individual TS functions:
wrangler dev
: https://www.paolotagliaferri.com/test-cloudflare-workers-with-jest-wrangler-travis/HeaderIntegrityFetcher computes the header integrity of sub-resources by two steps: fetching sub-resources and computing integrity.
The compute_integrity
method takes an unsigned subresource as input, but fetch_subresource
method uses a SXG-preferring header.
This gives incorrect header integrity when the back-end server supports SXG format, for example, when using sxg-playground to test a website that already enables cloudflare worker.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.