Git Product home page Git Product logo

sxg-rs's Introduction

sxg-rs

sxg-rs is a set of tools for generating signed exchanges at serve time:

These tools enable sites to be prefetched from Google Search in order to improve their Largest Contentful Paint, one of the Core Web Vitals.

For other technology stacks, see this list of SXG tools.

Next steps

After installing, take the following steps.

Verify and monitor

After installing, you may want to verify and monitor the results.

HTML processing

The worker contains some HTML processors. To activate them, explicitly label the character encoding as UTF-8, either via:

Content-Type: text/html;charset=utf-8

or via:

<meta charset=utf-8>

Preload subresources

LCP can be further improved by instructing Google Search to prefetch render-critical subresources for the page.

Same-origin

Add a preload link tag to the page, such as:

<link rel=preload as=image href="/foo.png">

sxg-rs will automatically convert these link tags into Link headers as needed for SXG subresource substitution. This uses a form of subresource integrity that includes HTTP headers. sxg-rs tries to ensure a static integrity value by stripping many noisy HTTP headers (like Date) for signed subresources, but you may need to list additional ones in the strip_response_headers config param.

To confirm it is working, run:

$ go install github.com/WICG/webpackage/go/signedexchange/cmd/dump-signedexchange@latest
$ dump-signedexchange -uri "$HTML_URL" -payload=false | grep Link

and verify that there is a rel=allowed-alt-sxg whose header-integrity matches the output of:

$ dump-signedexchange -uri "$SUBRESOURCE_URL" -headerIntegrity

If you have any same-origin preload tags that should not be converted into headers, add the data-sxg-no-header attribute to them.

Cross-origin

SXG preloading requires that the subresource is also an SXG. This worker assumes only same-origin resources are SXG, so its automatic logic is limited to those. You can manually support cross-origin subresources by adding the appropriate Link header as specified.

SXG-only behavior

There are two syntaxes for behavior that happens only when the page is viewed as an SXG. If you write:

<script data-issxg-var>window.isSXG=false</script>

then its inner content will be replaced by window.isSXG=true in an SXG. This could be used as a custom dimension by which to slice web analytics, or as a cue to fetch a fresh CSRF token.

If you write:

<template data-sxg-only>...</template>

then in an SXG, its inner content will be "unwrapped" out of the template and thus activated, and when non-SXG it will be deleted. Since SXGs can't Vary by Cookie, this could be used to add lazy-loaded personalization to the SXG, while not adding unnecesary bytes to the non-SXG. It could also be used to add SXG-only subresource preloads.

Preview in Chrome

Optionally, preview the results in the browser:

  • In development, set Chrome flags to allow the certificate.
  • Use an extension such as ModHeader to set the Accept header to text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3 (equivalent to what Googlebot sends).
  • Explore the results in the DevTools Network tab.

sxg-rs's People

Contributors

antiphoton avatar banaag avatar caoboxiao avatar dependabot[bot] avatar oliy avatar quangio avatar renovate-bot avatar twifkak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sxg-rs's Issues

Set config generator to be non-interactive

Currently cargo run -p config-generator is an interactive process. This is inconvenient because the user have to re-input everything or re-press enter to confirm everything.

We need to create a new source-of-truth file, which is read by config-generator to generate WranglerConfig and SxgConfig.

Process the Link header for validity & subresource preloading

1. Strip invalid link headers (DONE in #43)

The worker should process the Link header before signing, in order to make it compatible with Google SXG Cache requirements. In particular, it should:

  1. Drop all links except preload and allowed-alt-sxg.
  2. Drop all links with an unsupported param name.
  3. Drop all but the first 20 preloads.
  4. Convert URLs from relative to absolute.
  5. For any preloads without a matching allowed-alt-sxg, compute and add one.
  6. (Let's not bother with imagesrcset parsing; I think we can let that be the user's responsibility.)

2. Add allowed-alt-sxg (DONE in #57 and #61)

We want to compute allowed-alt-sxg because authors won't have done this already. It is necessary to support prefetching subresources from webpkgcache.com by way of subresource substittution.

For each preload, the worker needs to:

  1. Look up its URL in a header-integrity cache in the KV store.
  2. If missing, fetch the URL, compute its header-integrity per this definition and store it in the cache.
    1. For a first version, set an expiry of 1 day.
    2. Ideally, use an expiry matching the subresource's cache-control header.

The KV store minimizes the # of backend fetches caused by this feature.

3. Stabilize header-integrity

Split off to #26. DONE in #36.

The worker should also eliminate frequently changing response headers that don't affect the semantics of the SXG (e.g. Date). Resources for researching which headers to eliminate:
- https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_fields
- https://datatracker.ietf.org/doc/html/rfc7230 and friends

4. Fix docs. (PENDING in #61)

Then the sentence at the end of #4 can be reverted.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cargo
cloudflare_worker/Cargo.toml
  • console_error_panic_hook 0.1.7
  • wasm-bindgen 0.2.83
distributor/Cargo.toml
  • anyhow 1.0.66
  • base64 0.13.1
  • byte-strings 0.2.2
  • ciborium 0.2.0
  • clap 3.2.23
  • form_urlencoded 1.1.0
  • futures 0.3.25
  • http 0.2.8
  • hyper-rustls 0.23.2
  • hyper-trust-dns 0.5.0
  • hyper 0.14.23
  • lazy_static 1.4.0
  • nom 7.1.1
  • percent-encoding 2.2.0
  • regex 1.7.0
  • rustls 0.20.7
  • rustls-pemfile 1.0.1
  • sha2 0.10.6
  • thiserror 1.0.37
  • tls-listener 0.5.1
  • tokio 1.23.0
  • tokio-rustls 0.23.4
  • url 2.3.1
fastly_compute/Cargo.toml
  • anyhow 1.0.66
  • async-trait 0.1.59
  • base64 0.13.1
  • fastly ^0.8.9
  • http 0.2.8
  • log 0.4.17
  • log-fastly 0.8.9
  • pem 1.1.0
  • serde 1.0.149
  • serde_yaml 0.9.14
  • tokio 1.23.0
  • url 2.3.1
http_server/Cargo.toml
  • anyhow 1.0.66
  • async-trait 0.1.59
  • clap 3.2.23
  • fs2 0.4.3
  • futures 0.3.25
  • http 0.2.8
  • hyper-rustls 0.23.2
  • hyper-tls 0.5.0
  • hyper-trust-dns 0.5.0
  • hyper 0.14.23
  • lazy_static 1.4.0
  • lru 0.8.1
  • rand 0.8.5
  • serde_yaml 0.9.14
  • tokio 1.23.0
  • url 2.3.1
  • assert_matches 1.5.0
sxg_rs/Cargo.toml
  • anyhow 1.0.66
  • async-trait 0.1.59
  • base64 0.13.1
  • chrono 0.4.23
  • der-parser 8.1.0
  • futures 0.3.25
  • getrandom 0.2.8
  • http 0.2.8
  • js-sys 0.3.60
  • lol_html 0.3.1
  • nom 7.1.1
  • once_cell 1.16.0
  • pem 1.1.0
  • p256 0.11.1
  • serde 1.0.149
  • serde-wasm-bindgen 0.4.5
  • serde_json 1.0.89
  • serde_yaml 0.9.14
  • sha1 0.10.5
  • sha2 0.10.6
  • tokio 1.23.0
  • url 2.3.1
  • wasm-bindgen 0.2.83
  • wasm-bindgen-futures 0.4.33
  • web-sys 0.3.60
  • x509-parser 0.14.0
  • tokio-test 0.4.2
tools/Cargo.toml
  • anyhow 1.0.66
  • async-trait 0.1.59
  • base64 0.13.1
  • clap 3.2.23
  • ctrlc 3.2.3
  • der-parser 7.0.0
  • http 0.2.8
  • hyper 0.14.23
  • hyper-tls 0.5.0
  • pem 1.1.0
  • regex 1.7.0
  • serde 1.0.149
  • serde_json 1.0.89
  • serde_yaml 0.9.14
  • toml 0.5.9
  • tokio 1.23.0
  • url 2.3.1
  • warp 0.3.3
  • wrangler 1.19.13
github-actions
.github/workflows/code-style.yml
  • actions/checkout v3
  • actions/cache v3
  • actions-rs/cargo v1
  • actions-rs/cargo v1
  • actions-rs/cargo v1
  • actions/checkout v3
  • actions/setup-node v3
.github/workflows/integration-tests.yml
  • actions/checkout v3
  • actions/checkout v3
  • actions-rs/toolchain v1
  • actions/checkout v3
  • actions/setup-go v3
  • actions-rs/toolchain v1
  • actions/checkout v3
  • actions/cache v3
.github/workflows/unit-tests.yml
  • actions/checkout v3
  • actions/cache v3
  • actions-rs/toolchain v1
  • actions-rs/cargo v1
  • actions-rs/cargo v1
  • actions/checkout v3
  • actions/setup-node v3
npm
cloudflare_worker/worker/package.json
  • @cloudflare/workers-types 3.18.0
  • @types/node 16.18.11
  • esbuild 0.16.14
  • glob 8.0.3
  • gts 3.1.1
  • tslib 2.4.1
  • typescript 4.9.4
  • node >=16.0.0
playground/package.json
  • commander 9.4.1
  • dompurify 2.4.2
  • fastify 4.11.0
  • jsdom 20.0.3
  • node-fetch 3.3.0
  • puppeteer 17.1.3
  • @types/dompurify 2.4.0
  • @types/jsdom 20.0.1
  • @types/node 17.0.18
  • @types/node-fetch 2.6.2
  • esbuild 0.16.14
  • gts 3.1.1
  • tslib 2.4.1
  • typescript 4.9.4
  • node >=16.0.0
typescript_utilities/package.json
  • @types/jasmine 4.3.1
  • esbuild 0.16.14
  • gts 3.1.1
  • jasmine-core 4.5.0
  • karma 6.4.1
  • karma-chrome-launcher 3.1.1
  • karma-jasmine 5.1.0

  • Check this box to trigger a request for Renovate to run again on this repository

Process cross-origin links for subresource preloading

#57 limited its solution of #13 to same-origin preloads for efficiency's sake:

  • Traffic costs for same-origin subrequests are probably cheaper than cross-origin.
  • Most cross-origin hrefs won't have corresponding SXGs, so the preload processing would likely fail (and thus be in vain), while same-origin hrefs would likely have corresponding SXGs because sxg-rs is usually run on the whole origin.

To address this problem, we could mitigate the cost, by caching a per-origin boolean saying whether it supports SXG. Say, 1h expiry. On cache expiry/miss, process the link as if it supports SXG and update the cache accordingly.

Reduce payload buffer size in cloudflare_worker

Per this TODO:

As a performance optimization, maybe start with a Content-Length sized buffer and resize exponentially if necessary. Alternatively, use the limitBytes() transformer in streamFrom, and construct a flyweight Response object here in order to call arrayBuffer().

Don't respond to .sxg/* when request host is html_host

To minimize conflict with the origin's URL namespace, the cloudflare_worker could only respond to reserved_path URLs when requested on the workers.dev domain. Does fastly_compute have a similar special-purpose domain? If not, this could be an optional behavior.

Enable authors to specify SXG-only tags

For instance if the HTML contains:

<template data-sxg-only>
  <script>log("visit-from-sxg=true")</script>
</template>

then when generating an SXG, rewrite it to:

<script>log("visit-from-sxg=true")</script>

Other possible spellings include <template class=sxg-only> or <script type=text/sxg-only> or <!--[if SXG]>, but the above seems the most general and least likely to collide with existing pages. (But that should be confirmed before implementing.)

Motivation

Enable web publishers to distinguish SXG visits from unsigned visits in their analytics. This could help them track overall usage, or compare performance metrics between the two cohorts. Examples of feature requests for a similar tool (CF ASX):

Optimize binary size

The compiled cloudflare_worker wasm is currently around 1.2MB. The opt-level and lto tricks didn't reduce that. It's clearly possible to make small wasm binaries. Investigate the easiest change possible to make this one smaller.

The twiggy command identified table[0] as the main culprit. I think that might have to do with the use of JS callbacks? Judging by the MDN article and Lin Clark's article.

Things I didn't try:

  • get rid of format!
  • abort on panic
  • no_main
  • no_std

Articles I didn't read:

Renew certificates using ACME

Add configuration parameters similar to webpkgserver to support ACME renewal of a certificate. It's not necessary to support all verification methods (DNS/HTTP/ALPN); one is sufficient (whichever is automatable).

This should include some support for monitoring that the certificate is still valid. Here are some ideas:

  1. Return an HTTP error (or JS exception?) when the certificate is expired, so it can show up in Cloudflare analytics.
  2. Document some curl | openssl command the user could run as a cron job.
  3. Let the user configure a webhook URL to be pinged when the certificate is expired.

Add Netlify binding

Netlify supports serverless Rust. It appears from that blogpost that Netlify serverless functions run on AWS Lambda, so there may be significant overlap with #251 (e.g. in integrating w/ the KV API for certs/OCSP).

Content-Length header after transformation

The current implementation does not update content-length header after performing SXG transformation (i.e. calling process_html). Should we either set the header to correct value or just remove it?

Switch cloudflare_worker/wrangler.toml from type = "rust" to type = "javascript"

Introducing a dependency on some of the functions in encoding_rs causes a segfault in wasm-opt before version 102. However, wrangler projects with type = "rust" depend on wasm-pack 0.10.0 and wasm-pack 0.10.0 depends on wasm-opt 78.

Rather than waiting for both of these dependencies to update in sequence, we should consider switching to type = "javascript". Either:

I learn towards the first. It seems easier and less risk, and doesn't preclude the doing second one later.

Set fetcher signer as worker member variables

Some SxgWorker functions take arguments of trait object Fetcher, Signer, (and HttpCache in #61). However, the implementations of these traits usually do not change during the lifetime of a worker. Hence we can move these trait objects into the SxgWorker member variables.

Changes would be like

  pub struct SxgWorker {
    ..
+   runtime: Runtime,
  }
+ pub struct Runtime {
+   signer: Mutex<dyn Signer>,
+   fetcher: Mutex<dyn Fetcher>,
+   cache: Mutex<dyn Cache>,
+ }

  impl SxgWorker {
-     pub async fn fetch_ocsp_from_ca<F: fetcher::Fetcher>(&self, fetcher: F) -> Vec<u8>;
+     pub async fn fetch_ocsp_from_ca(&self) -> Vec<u8>;
  }
}

Offer link preload opt-in

Opposite to #139, we could create a new link relation like

<link rel=sxg-preload as=... href=...>

which is converted to a Link header for SXG subresource prefetching, but not used for normal HTML. It would be registered here per the HTML spec.

Preferably, it may be possible to do this already like:

<template data-sxg-only><link rel=preload as=... href...></template>

We should verify this adds an SXG preload header but does not preload otherwise.

The PromoteLinkTagsToHeaders processor doesn't skip template contents. It probably should skip template contents except when the template is data-sxg-only.

Strip some stateful headers and reject others

By default, the worker should strip uncached headers + set-cookie and strict-transport-security from the response if it's present. Reject if any other stateful headers are present. It should be safe to strip two stateful headers are common, and websites tend to be resilient to them not working (e.g. because cookie blockers are common).

(It's possible that websites tend to be resilient to other stateful headers not working, but I don't have data/intuition on that. Given they're comparatively rare, we can err on the safe side without much loss of utility.)

Present behavior (IIUC), is:

Options (in order of preference):

  • Change reject_stateful_headers into strip_stateful_headers and the value is a list (default ['set-cookie', 'strict-transport-security']). Only if this is easy to do.
  • Make this the new behavior for reject_stateful_headers: false, and document that it only covers set-cookie and strict-transport-security.
  • Change reject_stateful_headers into a tri-state (none, all, or recommended).

Add end-to-end tests to CI

#41 runs the unit tests for Rust & TypeScript. Add to this, something that tests that they interact correctly with each other and with the Cloudflare Worker environment. Ideas:

End-to-end tests:

Integration tests for individual TS functions:

Make the cert-url same-origin and content-addressed

  • Change cert-url to be hosted same-origin with the signed URL (on html_host). Improves DX by removing the worker_host option from config.yaml.
  • Change path to .well-known/sxg-certs/... and register sxg-certs with IANA by filing an issue on well-known-uris. Reduces chance of conflict with other content.
  • Change basename from cert to the web-safe padded base64 encoding of the cert-sha256. Improves interaction with intermediary caches that don't have content-addressing.

Offer link preload opt-out

Offer a way to opt individual link tags out of being converted into link headers, e.g. with a data-no-sxg-header attribute.

Add signed_host config param

If specified, it would replace the host for the fallbackURL being signed. Otherwise, the current logic would apply (replace with html_host, or else don't replace).

Unlike html_host, signed_host would not affect where the URL is being fetched from.

Set SXG expiry from inner cache headers

Rather than setting sxg lifetime to a fixed 6 days, the worker should set it to min(7 days, whatever origin cache headers say).

I think that's Cache-Control: s-maxage, else Cache-Control: max-age, else Expires. But worth rereading the RFC to be sure.

Fix TypeScript warnings

Eliminate the TS warnings emitted by rollup, either by rolling back #114 and setting a <3 version constraint on @cloudflare/workers-types, or by fixing forward somehow (if easy).

Also, change unit-tests.yml to run rollup with --failAfterWarnings so that this is caught by the CI in the future.

Perhaps this article has a clue on how to fix forward. Otherwise, it seems not to be a high-severity bug; the Cloudflare Workers runtime is supposed to be backwards-compatible so it's just a change in the .d.ts files I guess.

Current implementation will strip headers with same key

The Headers struct uses HashMap to hold key-value pairs. However, headers can have same keys. For conforming headers, we can join them into one string, but it is not the case for all headers, especially when their values can contain comma(s). Thus, sxg-rs should use a multi-map (e.g. HeaderMap in http).

I can help replacing the current implementation if you think it makes sense.

Header integrity fetcher should not prefer SXG for subresources

HeaderIntegrityFetcher computes the header integrity of sub-resources by two steps: fetching sub-resources and computing integrity.

The compute_integrity method takes an unsigned subresource as input, but fetch_subresource method uses a SXG-preferring header.

This gives incorrect header integrity when the back-end server supports SXG format, for example, when using sxg-playground to test a website that already enables cloudflare worker.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.