google / webpackager Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Are there any plans to add a health check path to webpkgserver? We want to monitor the webpkgserver and use the k8s livenessProbe.
IIUC, webpkgserver always runs with an in-memory resource cache, which is unbounded and lacking expiration or eviction. (Expired entries are never deleted from memory unless they are replaced.) This cache persists for the life of the server.
This is OK only for a site with a small number of resources. We need to add some configuration parameters to make this work for larger sites. Some rough ideas:
0
to disable). For starters, we can do it by # of entries, but # of bytes would be a nice addition.Hi!
We have simple firewall that blocks non-browser requests to our websites by checking the user-agent request header. Requests from webpkgserver
gets blocked as it uses Go's http client as user-agent.
We use Nginx to proxy pass to webpkgserver
. Eg:
proxy_pass http://127.0.0.1:8080/priv/doc/https://example.com/foo.html;
We're wondering if it's possible for use add custom request header that gets relayed to https://example.com/foo.html
so we can use it header to unblock requests from webpkgserver
.
With the config below:
proxy_pass http://127.0.0.1:8080/priv/doc/https://example.com/foo.html;
proxy_set_header X-Is-WebPackager 1
we expect to have all requests from webpkgserver
to https://example.com/foo.html
will have the X-Is-WebPackager: 1
header. But, the custom header is not present which results in the request being blocked by firewall.
Is it by design?
The main README should:
When serving self-signed certs using locahost:8080/webpkg/cert/xxxx (where xxx = cert digest), the webpkgserver gives out an error while reading the dummy OCSP response in the cert cache:
This found some missing error handling in #37. This project needs some suppressions to run cleanly:
errcheck -ignore 'Close|Fprintf|Remove|RemoveAll|Write' ./...
This would likely significantly boost the LCP improvement from prefetching. It would provide parity with AMP Packager's preloadimage, though needn't use the same set of heuristics.
I'm not familiar enough with the architecture of webpackager. Does this make more sense as a new Processor, a new HTMLTask, or something else?
webpkgserver will set a default lifetime of 1 day for JS resources and 7 days for others (src). However, any HTML that preloads JS is effectively 1-day, unless the publisher can refresh a JS SXG without updating its header-integrity
.
GetFullHeader() should (default on, opt-out via toml config) remove any headers that are likely to change often, but don't affect the way the subresource is interpreted by the browser. The Date
header comes to mind, but it's worth a cursory glance of the HTTP spec to unearth any others.
Our site uses a third party resource preloading (AMP v0.js).
<link rel="preload" href="https://cdn.ampproject.org/lts/v0.js" as="script">
webpkgserver will attempt to fetch that resource, which will result in the following error.
2021/06/25 18:07:55 processing https://cdn.ampproject.org/lts/v0.js ...
2021/06/25 18:07:55 error with processing https://cdn.ampproject.org/lts/v0.js: fetch: URL doesn't match the fetch targets
Is there any way to ignore this preload?
The Google SXG cache has a limit of 20 preloads. I don't see anything in the webpackager code that guarantees that requirement is met. (Did I miss it?) We should provide a [default enabled] option to do so.
Seems especially necessary given the optional HTMLTask to promote preload tags to headers, where the former are more prominent today, with no such limit.
prerequisites.yml only checks against Go 1.14. We should test multiple, and if possible, a rolling version of whatever Go-latest is.
I think webpkgserver
should return 404 error if the cache does not exist.
(Looking here, it looks like it's supposed to be.)
https://github.com/google/webpackager/blob/main/server/handler.go#L119-L122
However, in fact, "No such file or directory" occurred in ioutil.ReadFile
, and webpkgserver seems to be getting a 500 error.
https://github.com/google/webpackager/blob/main/certchain/certmanager/multicert_disk_cache.go#L115-L118
I didn't know how certmanager.ErrNotFound
was used, but what about returning certmanager.ErrNotFound here?
https://github.com/google/webpackager/blob/main/certchain/certchainutil/certchainutil.go#L45-L48
Hello, I am getting the following issue when sending a request to the webpkgserver
:
2021/09/17 16:29:24 Listening at [::]:80
2021/09/17 16:29:24 Successfully retrieved valid OCSP.
2021/09/17 16:29:26 processing https://www.perlego.com/book/1690290/criminal-law-pdf ...
2021/09/17 16:29:26 error with processing https://www.perlego.com/book/1690290/criminal-law-pdf: fetch: URL doesn't match the fetch targets
This is the webpkgserver.toml
file I'm using:
[Listen]
Port = 80
[SXG.Cert]
PEMFile = '/www_perlego_com.pem'
KeyFile = '/server.key'
AllowTestCert = false
[SXG]
CertURLBase = 'https://perlego.com/'
[[Sign]]
Domain = 'perlego.com'
This is how I'm sending the request:
wget -v -d --header="Accept: application/signed-exchange;v=b3" localhost/priv/doc/https://www.perlego.com/book/1690290/criminal-law-pdf
I'm confused since from reading the source code, I gather that the error URL doesn't match the fetch targets
is thrown when the Domain
field of the server configuration does not match the actual host of the request, but in this case I believe it does.
Thanks in advance,
Juan
Drop the link header altogether unless it's preload as per https://github.com/google/webpackager/blob/master/docs/cache_requirements.md.
Since some Link headers are seen as semantic, I think it's worth creating a toml config for it (default on).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.