Git Product home page Git Product logo

Comments (9)

jacksonh avatar jacksonh commented on September 26, 2024

Yeah i think there is something weird with this feed. This is what I get when i try to fetch it locally:

* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=rachelbythebay.com
*  start date: May  7 00:00:00 2023 GMT
*  expire date: Jun  5 23:59:59 2024 GMT
*  subjectAltName: host "rachelbythebay.com" matched cert's "rachelbythebay.com"
*  issuer: C=FR; ST=Paris; L=Paris; O=Gandi; CN=Gandi Standard SSL CA 2
*  SSL certificate verify ok.
> GET /w/atom.xml HTTP/1.1
> Host: rachelbythebay.com
> User-Agent: curl/7.87.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 429 Too many requests
< Date: Wed, 07 Feb 2024 02:38:46 GMT
< Server: Apache/2.4.57
< Retry-After: 86400
< Transfer-Encoding: chunked
< 
* Connection #0 to host rachelbythebay.com left intact

from omnivore.

jacksonh avatar jacksonh commented on September 26, 2024

I guess this is why:, https://rachelbythebay.com/w/feed/

Go back to RFC 1945, aka the HTTP/1.0 spec. It's from May 1996. It specifies the "If-Modified-Since" header. All you have to do is set it in your HTTP request to the server using the exact data that was supplied in the "Last-Modified" header when you last got an update. Don't try to interpret it. Don't try to hard-code it. Just keep it around as an opaque token and pass it back.

Alternatively, you can store the value from the "ETag" header, and return it in an "If-None-Match" header in your request. This should also not be interpreted - it's an opaque token, and should be returned just like it was given to you. Bear in mind that if it looks like it's wrapped in quotes "like this", then those quotes are part of the value! (And no, I didn't have anything to do with it acting that way.)

If nothing has changed, you'll get a 304 status in the response and can know that you're as caught up as you can possibly be.

from omnivore.

alexhumphreys avatar alexhumphreys commented on September 26, 2024

Interesting! How do ye handle etags and If-Modified-Since/If-None-Match headers?

I'm unfamiliar with this app, but took a look and found rss-handler, looks like it has a lastFetchedTimestamps and lastFetchedChecksums, so maybe it's doing something with those? The axios.get call doesn't seem to be passing many headers though.

Guess there's also that429 response. Are the omnivore requests sent by the server or the client? Guess if it's the server and several people already have this feed then it could get rate limited?

from omnivore.

jacksonh avatar jacksonh commented on September 26, 2024

Yeah we don't send an If-Modified-Since header now.

from omnivore.

jacksonh avatar jacksonh commented on September 26, 2024

Interesting! How do ye handle etags and If-Modified-Since/If-None-Match headers?

I'm unfamiliar with this app, but took a look and found rss-handler, looks like it has a lastFetchedTimestamps and lastFetchedChecksums, so maybe it's doing something with those? The axios.get call doesn't seem to be passing many headers though.

Guess there's also that429 response. Are the omnivore requests sent by the server or the client? Guess if it's the server and several people already have this feed then it could get rate limited?

yeah they are sent by the server.

from omnivore.

alexhumphreys avatar alexhumphreys commented on September 26, 2024

Is storing/passing etags something ye'd accept a PR for? I could try add it to rss-handler, assuming that's the place the change should be made

might not help with the rate limiting, but might get a little further than it's currently getting

from omnivore.

jacksonh avatar jacksonh commented on September 26, 2024

Yeah passing the tags make sense, but the rate limit issue I think will still cause problems. Maybe I'm misunderstanding but it seems flawed and would require the entire feed to be cached. If user A adds the feed and we fetch it, then ten minutes later user B adds it, we'd want to fetch again. Later, when the feeds are refreshed we'd only make one request for both users, but that initial refresh for user B would be rate limited.

from omnivore.

alexhumphreys avatar alexhumphreys commented on September 26, 2024

I tried debugging this by running omnivore locally with docker compose. Tracked down the errors to this call to parser.parseURL. That seems to be this parseURL function here from the rss-parser library.

Good news is that function seems to parse etags if it finds one, so that info should already be available. (edit: or not 🙈 )

Bad news is the error handling is kinda janky:

        else if (res.statusCode >= 300) {
          return reject(new Error("Status code " + res.statusCode));
        }

So if there's a status above 300, say 429, it just swallows that info and returns an error with the string "Status code 429". The nice object with .statusCode and probably some kind of error message is lost. So that'll mean handling 429s would involve parsing that string for a status code, and even after that I'm not sure what a good way to handle it would be 😅

Think current error handling on the omnivore side is to return a not found if any error occurs in the parseFeed function

from omnivore.

alexhumphreys avatar alexhumphreys commented on September 26, 2024

Started work here on this. Trying to split the http requests for getting the RSS feed, from the parsing of the content of the RSS feed, similar to how the rss-parser package mentioned here.

Making the http requests separately means it's easy to grab the etag header, or act on 429 or other status codes, even if it's just to give better error messages to the user.

I've also added a few extra graphql SubscribeErrorCodes to return the possible errors, so they can be acted upon.

This is becoming a big enough change, and I'm not sure how to wire the etag stuff up correctly so it'll be passed on feed refresh. So before I go any further, maybe you can look over and see if this is the kind of thing you'd accept a patch for, and give me some advice as to whether I'm going in the right direction?

from omnivore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.