drufball / sw-content-indexing Goto Github PK
View Code? Open in Web Editor NEWAllowing service workers to inform browsers about the quality of their offline experiences.
Allowing service workers to inform browsers about the quality of their offline experiences.
Considering scenarios like add to homescreen and push notifications where a little bit of data from the manifest is needed to be able to perform an action, adding a long list of cached URL's would just increase the amount of time to retrieve the manifest and perform the action.
Secondly, sites that offer user action to selectively cache pages would not be able to raise that via the manifest.
I've tried writing out the response that the event/URL polling based approach may want from YouTube, and it seems to me that we probably want something like:
[
{
url: "youtube.com/watch?playlist=123457",
content-type: "video playlist",
title: "Funny YouTube videos",
sources: [{
content-type: "video",
title: "Rickroll",
description: "...",
length: 323.4
},
{
content-type: "video",
title: "Charlie bit my finger",
description: "...",
length: 123.4
}]
},
{
url: "youtube.com/watch?v=123456",
content-type: "video",
title: "Charlie bit my finger",
description: "...",
length: 123.4
},
{
url: "youtube.com/watch?v=123457",
content-type: "video",
title: "Rickroll",
description: "...",
length: 323.4
},
]
The current design assumes each entry has a list of sources, but I think the playlist example shows we also need "parent" metadata which would cover the playlist itself, in which case for the standalone videos the "parent" is the video itself so it's metadata should appear at the top level instead of in a 'sources' section.
Alternatively we could drop the sources idea which allows for nested content and have sites simply list their content individually (i.e. as we've seen above the videos are indexed independently of their parent playlist so we could always drop them from the parent playlist altogether).
Any thoughts? Are there other approaches we could apply? Does the approach of having a separate parent metadata vs child sources not make sense for some cases?
I don't think the manifest approach works here, because it's nearly impossible to be exhaustive about the pages that work offline. Take https://wiki-offline.jakearchibald.com/
- which pages work offline?
https://wiki-offline.jakearchibald.com/?07dbe58e-dda8-4afb-ae05-1d0704c04e6c
https://wiki-offline.jakearchibald.com/?12b508d6-495d-46aa-b93a-0120bc61f210
https://wiki-offline.jakearchibald.com/?7f068bd6-cda2-4910-8b26-73343c07d691
https://wiki-offline.jakearchibald.com/?80ba80b0-37d9-4eed-8336-9fd235753f8b
https://wiki-offline.jakearchibald.com/?1e4dc8f2-942b-4deb-93d6-1e19e0666757
https://wiki-offline.jakearchibald.com/?ad5a52d0-28fd-4dcb-9f29-53d54866c0f7
https://wiki-offline.jakearchibald.com/?0b829457-8ad8-49cf-bca2-7c1375eba7ad
https://wiki-offline.jakearchibald.com/?93e0dbed-bd96-46c3-8012-27a27eda39f9
https://wiki-offline.jakearchibald.com/?1b4ac5b6-52f5-4cf6-9423-c3268c7ccf69
https://wiki-offline.jakearchibald.com/?203113b6-16d3-43f9-bc2e-94c7fdc75c69
https://wiki-offline.jakearchibald.com/?aef4665e-ef2e-4a5b-abeb-78f5b259c5db
https://wiki-offline.jakearchibald.com/?ed24f9af-5256-44fb-9c9a-fafe5b77bae7
https://wiki-offline.jakearchibald.com/?77aa5803-ac46-47c3-bff0-fb96aead0e30
https://wiki-offline.jakearchibald.com/?4fe72214-90b9-447e-bdab-2df5b6a1899d
https://wiki-offline.jakearchibald.com/?d82b54d2-bbf9-434f-9d0a-967efb22f3a5
https://wiki-offline.jakearchibald.com/?e086c709-30ee-4cba-8008-9347c9efe6f5
https://wiki-offline.jakearchibald.com/?b97f6e06-36c0-4755-bee0-f5a3bda2649c
https://wiki-offline.jakearchibald.com/?4d6da876-612f-4215-92be-a46f12210ed8
https://wiki-offline.jakearchibald.com/?ae3a92a3-d7fc-4d57-b257-8f5c98f32eee
https://wiki-offline.jakearchibald.com/?3a69a789-a1cd-403f-9759-96514ce8366a
etc etc etc
But maybe this page doesn't work offline:
https://wiki-offline.jakearchibald.com/?page=20
It seems easier for the browser to query the offline capability for a given URL. The browser could trigger a SW event providing the following request:
new Request(url, {
method: 'HEAD',
cache: 'force-cache'
});
The SW would respondWith
a response if it could. The response would contain the content-type, so there's no need to provide that separately. In many cases, you'd be able to reuse your fetch event listener for this, since fetch(event.request)
would not hit the network due to the cache rule.
Considering scenarios like add to homescreen and push notifications where a little bit of data from the manifest is needed to be able to perform an action, adding a long list of cached URL's would just increase the amount of time to retrieve the manifest and perform the action.
Secondly, sites that offer user action to selectively cache pages would not be able to raise that via the manifest.
Two thoughts.
var allRequestsInCache = () => window.performance.getEntriesByType("resource").every(r => caches.match(new Request(r.name)));
You should be able to determine if the start_url is cached already, and then you should be able to determine if site requests from the previous load are in the cache too.
I suppose one of my overriding thoughts, is that it is not clear why we need this at all.
I think it's worth noting that we'd likely need to standardize the network response sites should return when polled for their resources, at the moment we've noted
No API additions required since the manifest is extensible
But I think that probably glosses over the fact that we've just pushed the standardization work from an IDL API to a JSON structure in a response.
Why not just have the service worker say which urls it can handle (online or offline)?
addEventListener('getAccessibleUrls', function(offline) {
if (offline) {
if (caches.has("images-cache"))
return ["https://test.org/images/**/small", ...];
else
return [];
}
return ["https://test.org/*"]
})
which would return a generated list of available paths in something like the gulp glob syntax, like "test.org/images/*/".
The browser has access to the exact cache so it should be able to auto complete paths given this, and this allows the SW to tell what other paths also work (ie they could show generated content from the SW).
Do you have any ideas about how the information extracted from the API would be exposed in the browser's UI? A homescreen icon in effect allows a PWA to expose a single URL that is likely to work offline. (And that page itself can obviously link to whatever resources it thinks may also be offline.) How would the browser reveal the additional pages that would work offline? Via the NTP? Something else?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.