Git Product home page Git Product logo

Comments (5)

hikavdh avatar hikavdh commented on September 24, 2024

The problem you see is that the new detailpages from tvgids.nl are slow and therefor can timeout. My guess at present is that this happens because they now are created on demand. When I first tested these new pages they often failed the first time but always succeeded the second time.
I also see you have a problem creating part of the error message text. I will check this out.

from tvgrabpyapi.

hikavdh avatar hikavdh commented on September 24, 2024

There are alternative json detail pages, but they miss part of the information like season/episode data.

from tvgrabpyapi.

rsenden avatar rsenden commented on September 24, 2024

Any news on this issue, or suggestions for work-arounds?

The global_timeout setting is set to 10 seconds, that seems a reasonable time for tvgids.nl to generate pages on demand. Does TVGrabAPI only fetch the actual details page, or also all links (i.e. images, ads, ...)? If I manually navigate to an arbitrary detail page on tvgids.nl, the page loads very fast with Ghostery enabled, but takes a much longer time if Ghostery is disabled (and thus the browser needs to load all ads and such). So if the grabber also fetches all images and ads, that could explain a time-out.

Are you sure though this issue is caused by time-outs, and not for example due to tvgids.nl returning a temporary error page or so? I think it would be helpful if the grabber would output more details on errors like these, like the full page source if there was an error parsing the data, or a message saying that the page could not be fetched due to a time-out.

I have created a small test set-up with empty cache database, only 2 channels enabled, fetching only 2 days, and all (detail) sources disabled apart from tvgids.nl. Indeed on the first run I see some errors like originally described, and on the second run (even after clearing the cache database) no errors are being reported.

In my regular set-up, I run the grabber every morning at 8:20, and this morning it took a little over 2.5 hours to complete. As such running the grabber twice isn't really an option I think; it simply takes too long, and by the time you run the grabber for the second time, tvgids.nl may have already expired the pages that were cached on the first run.

As for alternative detail sources, my set-up is dependent on information like season/episode, so I would prefer to fetch and combine as much information as possible from the available sources. So if at all possible, I would prefer to keep using the tvgids.nl HTML pages instead of JSON endpoints.

However, if there is no good work-around for this issue, and if other sources provide episode information as well, it could make sense to switch to the JSON endpoints for tvgids.nl. So two questions:

  • Are there any other detail sources (my setup lists tvgids.tv, npo.nl, primo.eu) that provide episode information for the main Dutch channels (NPO, RTL, SBS, maybe BBC 1/2 and Een/Canvas)?
  • I see that https://github.com/tvgrabbers/sourcematching/blob/master/sources/source-tvgids.nl.json already defines detail2 for the JSON-based details. Is there any configuration setting that allows us to use this source by default, instead of the HTML-based details source?

from tvgrabpyapi.

hikavdh avatar hikavdh commented on September 24, 2024

Sorry I react slowly, but of late I am short in time as I moved recently and my new house is taking a lot of time. Also my workspace is not jet up and running and still temporarily and limited.
When you test the pages through tv_grab_test_source.py you get more details. On a first try it almost always fails on a time-out. Trying again always succeeds. At first I did not recognize the significance for production, but now I think I will have to set it to the json page. This simply is not working.
In a few months when I have things here more organized I'll look deeper into solutions to utilize the html data.

from tvgrabpyapi.

hikavdh avatar hikavdh commented on September 24, 2024

Last week I moved the detailfetches for tvgids.nl from the html pages to the json pages. I now see my nightly fetch take less then half the time.

from tvgrabpyapi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.