Git Product home page Git Product logo

webpackager's Issues

Make ResourceCache scalable

IIUC, webpkgserver always runs with an in-memory resource cache, which is unbounded and lacking expiration or eviction. (Expired entries are never deleted from memory unless they are replaced.) This cache persists for the life of the server.

This is OK only for a site with a small number of resources. We need to add some configuration parameters to make this work for larger sites. Some rough ideas:

  • Max size of the cache (including 0 to disable). For starters, we can do it by # of entries, but # of bytes would be a nice addition.
  • Option to use the file-based cache as a backend to the memory cache, so it can be shared between replicas. This would reduce # of fetches.

Option for webpkgserver to relay custom headers

Hi!

We have simple firewall that blocks non-browser requests to our websites by checking the user-agent request header. Requests from webpkgserver gets blocked as it uses Go's http client as user-agent.

We use Nginx to proxy pass to webpkgserver. Eg:

proxy_pass http://127.0.0.1:8080/priv/doc/https://example.com/foo.html;

We're wondering if it's possible for use add custom request header that gets relayed to https://example.com/foo.html so we can use it header to unblock requests from webpkgserver.

With the config below:

proxy_pass    http://127.0.0.1:8080/priv/doc/https://example.com/foo.html;
proxy_set_header    X-Is-WebPackager    1

we expect to have all requests from webpkgserver to https://example.com/foo.html will have the X-Is-WebPackager: 1 header. But, the custom header is not present which results in the request being blocked by firewall.

Is it by design?

Document the CLI way to keep cert.cbor updated

The main README should:

  • Recommend regenerating the cert.cbor from cert.pem ~daily, so that it can pick up a fresh OCSP response.
  • Recommend renewing the cert.pem every ~80 days.
  • Recommend a tool for renewing with ACME. (Either verify that a popular CLI works with SXG certs, or write our own.)

Fix webpkgserver so it doesn't error out due to dummy OCSP response

When serving self-signed certs using locahost:8080/webpkg/cert/xxxx (where xxx = cert digest), the webpkgserver gives out an error while reading the dummy OCSP response in the cert cache:

  • asn1: structure error: tags don't match (16 vs {class:1 tag:4 length:117 isCompound:true}) {optional:false exp
    licit:false application:false private:false defaultValue: tag: stringType:0 timeType:0 set:false omitEmpty:fal
    se} responseASN1 @2

Automate errcheck

This found some missing error handling in #37. This project needs some suppressions to run cleanly:

errcheck -ignore 'Close|Fprintf|Remove|RemoveAll|Write' ./...

Option to preload heuristically-determined hero image

This would likely significantly boost the LCP improvement from prefetching. It would provide parity with AMP Packager's preloadimage, though needn't use the same set of heuristics.

I'm not familiar enough with the architecture of webpackager. Does this make more sense as a new Processor, a new HTMLTask, or something else?

Add option to remove high-entropy low-effect response headers before signing

webpkgserver will set a default lifetime of 1 day for JS resources and 7 days for others (src). However, any HTML that preloads JS is effectively 1-day, unless the publisher can refresh a JS SXG without updating its header-integrity.

GetFullHeader() should (default on, opt-out via toml config) remove any headers that are likely to change often, but don't affect the way the subresource is interpreted by the browser. The Date header comes to mind, but it's worth a cursory glance of the HTTP spec to unearth any others.

[webpkgserver] ignore third party subresources preloading

Our site uses a third party resource preloading (AMP v0.js).

<link rel="preload" href="https://cdn.ampproject.org/lts/v0.js" as="script">

webpkgserver will attempt to fetch that resource, which will result in the following error.

2021/06/25 18:07:55 processing https://cdn.ampproject.org/lts/v0.js ...
2021/06/25 18:07:55 error with processing https://cdn.ampproject.org/lts/v0.js: fetch: URL doesn't match the fetch targets

Is there any way to ignore this preload?

Limit to 20 preloads

The Google SXG cache has a limit of 20 preloads. I don't see anything in the webpackager code that guarantees that requirement is met. (Did I miss it?) We should provide a [default enabled] option to do so.

Seems especially necessary given the optional HTMLTask to promote preload tags to headers, where the former are more prominent today, with no such limit.

500 error occurs when ocsp cache does not exist

I think webpkgserver should return 404 error if the cache does not exist.
(Looking here, it looks like it's supposed to be.)
https://github.com/google/webpackager/blob/main/server/handler.go#L119-L122

However, in fact, "No such file or directory" occurred in ioutil.ReadFile, and webpkgserver seems to be getting a 500 error.
https://github.com/google/webpackager/blob/main/certchain/certmanager/multicert_disk_cache.go#L115-L118

I didn't know how certmanager.ErrNotFound was used, but what about returning certmanager.ErrNotFound here?
https://github.com/google/webpackager/blob/main/certchain/certchainutil/certchainutil.go#L45-L48

URL doesn't match the fetch targets

Hello, I am getting the following issue when sending a request to the webpkgserver:

2021/09/17 16:29:24 Listening at [::]:80
2021/09/17 16:29:24 Successfully retrieved valid OCSP.
2021/09/17 16:29:26 processing https://www.perlego.com/book/1690290/criminal-law-pdf ...
2021/09/17 16:29:26 error with processing https://www.perlego.com/book/1690290/criminal-law-pdf: fetch: URL doesn't match the fetch targets

This is the webpkgserver.toml file I'm using:

[Listen]
Port = 80

[SXG.Cert]
PEMFile = '/www_perlego_com.pem'
KeyFile = '/server.key'
AllowTestCert = false

[SXG]
CertURLBase = 'https://perlego.com/'

[[Sign]]
Domain = 'perlego.com'

This is how I'm sending the request:

wget -v -d --header="Accept: application/signed-exchange;v=b3" localhost/priv/doc/https://www.perlego.com/book/1690290/criminal-law-pdf

I'm confused since from reading the source code, I gather that the error URL doesn't match the fetch targets is thrown when the Domain field of the server configuration does not match the actual host of the request, but in this case I believe it does.

Thanks in advance,
Juan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.