Git Product home page Git Product logo

sw-content-indexing's People

Contributors

drufball avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sw-content-indexing's Issues

Manifest Size a Concern?

Considering scenarios like add to homescreen and push notifications where a little bit of data from the manifest is needed to be able to perform an action, adding a long list of cached URL's would just increase the amount of time to retrieve the manifest and perform the action.

Secondly, sites that offer user action to selectively cache pages would not be able to raise that via the manifest.

Do we need sources for child content?

I've tried writing out the response that the event/URL polling based approach may want from YouTube, and it seems to me that we probably want something like:

[
    {
            url: "youtube.com/watch?playlist=123457",
            content-type: "video playlist",
            title: "Funny YouTube videos",
            sources: [{
              content-type: "video",
              title: "Rickroll", 
              description: "...",
              length: 323.4
            },
           { 
              content-type: "video",
              title: "Charlie bit my finger", 
              description: "...",
              length: 123.4
            }]   
     },
    {
            url: "youtube.com/watch?v=123456",
            content-type: "video",
            title: "Charlie bit my finger", 
            description: "...",
            length: 123.4 
      },
    {
            url: "youtube.com/watch?v=123457",
            content-type: "video",
            title: "Rickroll", 
            description: "...",
            length: 323.4
     },
]

The current design assumes each entry has a list of sources, but I think the playlist example shows we also need "parent" metadata which would cover the playlist itself, in which case for the standalone videos the "parent" is the video itself so it's metadata should appear at the top level instead of in a 'sources' section.

Alternatively we could drop the sources idea which allows for nested content and have sites simply list their content individually (i.e. as we've seen above the videos are indexed independently of their parent playlist so we could always drop them from the parent playlist altogether).

Any thoughts? Are there other approaches we could apply? Does the approach of having a separate parent metadata vs child sources not make sense for some cases?

Making the API per-URL

I don't think the manifest approach works here, because it's nearly impossible to be exhaustive about the pages that work offline. Take https://wiki-offline.jakearchibald.com/ - which pages work offline?

https://wiki-offline.jakearchibald.com/?07dbe58e-dda8-4afb-ae05-1d0704c04e6c
https://wiki-offline.jakearchibald.com/?12b508d6-495d-46aa-b93a-0120bc61f210
https://wiki-offline.jakearchibald.com/?7f068bd6-cda2-4910-8b26-73343c07d691
https://wiki-offline.jakearchibald.com/?80ba80b0-37d9-4eed-8336-9fd235753f8b
https://wiki-offline.jakearchibald.com/?1e4dc8f2-942b-4deb-93d6-1e19e0666757
https://wiki-offline.jakearchibald.com/?ad5a52d0-28fd-4dcb-9f29-53d54866c0f7
https://wiki-offline.jakearchibald.com/?0b829457-8ad8-49cf-bca2-7c1375eba7ad
https://wiki-offline.jakearchibald.com/?93e0dbed-bd96-46c3-8012-27a27eda39f9
https://wiki-offline.jakearchibald.com/?1b4ac5b6-52f5-4cf6-9423-c3268c7ccf69
https://wiki-offline.jakearchibald.com/?203113b6-16d3-43f9-bc2e-94c7fdc75c69
https://wiki-offline.jakearchibald.com/?aef4665e-ef2e-4a5b-abeb-78f5b259c5db
https://wiki-offline.jakearchibald.com/?ed24f9af-5256-44fb-9c9a-fafe5b77bae7
https://wiki-offline.jakearchibald.com/?77aa5803-ac46-47c3-bff0-fb96aead0e30
https://wiki-offline.jakearchibald.com/?4fe72214-90b9-447e-bdab-2df5b6a1899d
https://wiki-offline.jakearchibald.com/?d82b54d2-bbf9-434f-9d0a-967efb22f3a5
https://wiki-offline.jakearchibald.com/?e086c709-30ee-4cba-8008-9347c9efe6f5
https://wiki-offline.jakearchibald.com/?b97f6e06-36c0-4755-bee0-f5a3bda2649c
https://wiki-offline.jakearchibald.com/?4d6da876-612f-4215-92be-a46f12210ed8
https://wiki-offline.jakearchibald.com/?ae3a92a3-d7fc-4d57-b257-8f5c98f32eee
https://wiki-offline.jakearchibald.com/?3a69a789-a1cd-403f-9759-96514ce8366a
etc etc etc

But maybe this page doesn't work offline:

https://wiki-offline.jakearchibald.com/?page=20

It seems easier for the browser to query the offline capability for a given URL. The browser could trigger a SW event providing the following request:

new Request(url, {
  method: 'HEAD',
  cache: 'force-cache'
});

The SW would respondWith a response if it could. The response would contain the content-type, so there's no need to provide that separately. In many cases, you'd be able to reuse your fetch event listener for this, since fetch(event.request) would not hit the network due to the cache rule.

Manifest Size a Concern?

Considering scenarios like add to homescreen and push notifications where a little bit of data from the manifest is needed to be able to perform an action, adding a long list of cached URL's would just increase the amount of time to retrieve the manifest and perform the action.

Secondly, sites that offer user action to selectively cache pages would not be able to raise that via the manifest.

Reasoning for the API is not clear.

Two thoughts.

  1. this sounds like we are re-implementing AppCache manfiest on top of SW.
  2. can we not determine this programmatically? I wrote a test harness for detecting this in the past.
var allRequestsInCache = () => window.performance.getEntriesByType("resource").every(r => caches.match(new Request(r.name)));

You should be able to determine if the start_url is cached already, and then you should be able to determine if site requests from the previous load are in the cache too.

I suppose one of my overriding thoughts, is that it is not clear why we need this at all.

Note standardization work needed for manifest approach

I think it's worth noting that we'd likely need to standardize the network response sites should return when polled for their resources, at the moment we've noted

No API additions required since the manifest is extensible

But I think that probably glosses over the fact that we've just pushed the standardization work from an IDL API to a JSON structure in a response.

Let the SW say which urls are accessible

Why not just have the service worker say which urls it can handle (online or offline)?

addEventListener('getAccessibleUrls', function(offline) { 
  if (offline) {
    if (caches.has("images-cache"))
      return ["https://test.org/images/**/small", ...];
    else
      return [];
  }
  return ["https://test.org/*"]
})

which would return a generated list of available paths in something like the gulp glob syntax, like "test.org/images/*/".

The browser has access to the exact cache so it should be able to auto complete paths given this, and this allows the SW to tell what other paths also work (ie they could show generated content from the SW).

What would the UI look like?

Do you have any ideas about how the information extracted from the API would be exposed in the browser's UI? A homescreen icon in effect allows a PWA to expose a single URL that is likely to work offline. (And that page itself can obviously link to whatever resources it thinks may also be offline.) How would the browser reveal the additional pages that would work offline? Via the NTP? Something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.