Git Product home page Git Product logo

Comments (8)

allejok96 avatar allejok96 commented on August 12, 2024

Haha okay that is slow. Thanks for registering and bringing my attention to this.

By default the script checks the md5 checksum of files. This is probably what's going on. You should be able to turn it off with --no-checksum.

I don't know why it has been fast before tho. Checksumming has been enabled by default since forever. It might have been off in 2017 but that would be a bug (just for reference 97e9c76#diff-d1acb20e5fa951d2afe925c2e3706ef8)

I think it checks all existing files, every time ... As far as I can remember it has done this for a long time too. If it wasn't broken... :) Or your catalogue has grown big.

I'll consider to change the default, since md5 is cpu and disk heavy. A simple file size check is enough most of the time.

If it's not that, we have a problem.

PS nope there is no log file, I'm afraid. And Python debugging is a mess without a good IDE.

from jw-scripts.

allejok96 avatar allejok96 commented on August 12, 2024

I just saw --no-checksum isn't working... 🤦

from jw-scripts.

allejok96 avatar allejok96 commented on August 12, 2024

I'll push this fix later, until then:
in arguments.py, please change line 135 from

add_predefined('--no-checksum', action='store_false', dest='checksum',

to

add_predefined('--no-checksum', action='store_false', dest='checksums',

from jw-scripts.

roffikk avatar roffikk commented on August 12, 2024

Ok, I did some calculations and the time between 10 and 15 minutes seems reasonable for checking md5 sums, if it's checking every file. I have currently over 170 GB of movies (1666 files). When I first used your tool, I downloaded about 100 GB of data. So, it didn't grow big that much. I'm not sure what version I used before, but quite sure it didn't take that long.

The patch works fine, the script is close to instant after downloading indexes.

I suppose you deleted one comment, but it clarified why the script was deleting fully downloaded file, sometimes several times in a row. Somehow the checksum had to be wrong. Yesterday I put the file in folder manually and it accepted it (even without turning off checksums).

I've got an idea. Maybe it's sufficient to check md5 sum only for new and updated files only after downloading them? When the file is already on the disk, there's not much it could happen to it - so we can assume that file is intact next time, if the date size wasn't changed.

from jw-scripts.

allejok96 avatar allejok96 commented on August 12, 2024

You're absolutely right. The checking of all files dates back to the days I was trying to create a fully automated, headless Rasperry Pi playing downloading and playing videos infinitely, and I had to deal with the risk of SD card corruption and buggy video players...

And since you say sometimes the sums are even incorrect, I think it's better to have it default to only checking size. It's pretty uncommon for a download to get corrupted. Web browsers don't check md5s, and I doubt even JW Library does, since it creates such a CPU load...

For a moment I thought you had the reverse the order of point 2. and 3. in your first message, but when I saw you were right I deleted my last message. But yeah, there's a lack of good info about the procedure of the script... Even I had it mixed up.

I just realized another thing... The checksums are only checked for the files included in the request. This fact is not very obvious either. That means if you run --category LatestVideos it will only check those 10-20 files. But if you run --category VideoOnDemand, it will basically check everything.

So, here's the plan:

  • default to no checksum
  • add option --checksum=new for downloaded files only (simply --checksum will default to this)
  • add option --checksum=all for all indexed files (can run withot --download just for checking)
  • files with bad checksum (and or size?) won't be deleted, only display warning (slightly broken is better than nothing?)
  • add option --overwrite to re-download bad files. Don't delete if we won't download.
  • partially downloaded files that gets completed and fails checks (md5 optional) are always deleted and a fresh download started
  • add more verbosity to the output so the order of operation becomes more clear (can be controlled with -q)

What do you think?

from jw-scripts.

roffikk avatar roffikk commented on August 12, 2024

Oh, that's great, but it seems like a lot of work for you. I like particularly the "Don't delete if we won't download", cause now files first get deleted, and later re-downloaded. Or not.

So, I am for it, but maybe you can simplify this? Don't work too hard;) There are more important things in life.

from jw-scripts.

allejok96 avatar allejok96 commented on August 12, 2024

Thanks for your consideration and feedback. Don't worry, I'm on vacation :) But I can't go anywhere, because Corona, you know. It may sound like much work but I think it can be achieved with only a few changes to the download code.

from jw-scripts.

allejok96 avatar allejok96 commented on August 12, 2024

Default behavior is now changed, the old one can be activated with --checksum and --fix-broken.

You were right, it easier said than done, but that was because I discovered so many other bugs :)
Well, now they are fixed I hope. Good thing people aren't utilizing all the features.

from jw-scripts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.