Git Product home page Git Product logo

Comments (17)

vxbinaca avatar vxbinaca commented on June 18, 2024 1

If you don't wipe the downloads, it will simply add them back to the archive file. It sucks but here's how I handle large channels ex: Healthy Addict.

Try printing video IDs using youtube-dl with the --get-id flag. The channels IDs will be in a handle list.

youtube-dl --ignnore-config --get-id https://www.youtube.com/user/healthyaddict > target-IDs.txt

(optionally you can use xargs to clean the output of youtube-dl to make the IDs in a list)

Take the channel in chunks of 10 videos to start, if that runs well do 20 and do on. When you complete a block of uploads, remove it from the text file until empty. But with 557 gigs, holy shit are you sure Archive.org isn't throttling you? They have S3 limits.

A word on copyright takedowns:

YouTube said: This video contains content from FranceTV mcn, who has blocked it on copyright grounds. This means you can't get the video in any region while blocked it in your country means
you can use a VPS or VPN to get the video.

If the videos were subsequently blocked after you ripped them and still have them in the downloads folder....eh....better make an item for all the files and upload the entire directory using internetarchive. But if you can re-rip the channel using tubeup, do that.

from tubeup.

antonizoon avatar antonizoon commented on June 18, 2024

@vxbinaca Maybe we should try to catch ExtractorErrors so we can note them down then continue. This way the script can state which urls failed at the end, instead of just crashing.

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

@antonizoon @rudolphos Would '--ignore-errors' on youtube-dl side fix this?

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

@rudolphos I would suggest hand editing the archive file to delete everything after the last successful upload. Although if there was an error for copyright, that particular video ID isn't archived. Anything after it, it will get.

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

@vxbinaca

hand editing the archive file to delete everything after the last successful upload

There are multiple archive files, like 80 or more..
By delete everything you mean 1 video file and those other types of subtitles and jpegs ?

Just tried it on the latest uploaded item, but nothing happened. Tried it 2 times.

root@xxxxx:~# tubeup https://www.youtube.com/user/xxxx
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.12
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10-               yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0.               2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:xxx
root@xxxx:~# tubeup https://www.youtube.com/user/xxxx
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.12
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10-               yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0.               2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:xx
root@xxxxx:~#

This user didn't had any copyright errors. But the archiving process is just not working anymore, videos are 90% downloaded but like 40% of them uploaded. Basically nothing is working anymore when I ran tubeup with this particular channel.

tubeup folder has more video files than the total uploaded, so the ones that haven't been uploaded are not removed, but also not uploaded ...

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

It sounds like a lot of videos on this channel are copyright blocked in certain countries. Try using a VPS/VPN?

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

I mean, go into the archive file that record your progress ( ~/.tubeup/.ytdlatchive ), go look at your Archive.org uploads, find the last successful complete upload:

https://archive.org/details/youtube-8HAMrXfTJng

Okay take the video ID, that's in the item identifier, open ~/.tubeup/.ytdlatchive in a text editor. Hit CTRL-F, find the line with 8HAMrXfTJng. Delete everything below that line.

Next, run rm ~/.tubeup/downloads/*

Now re-run your rip. It should pick back up after the last successful upload.

Some tips:

  • Don't use wifi to do the rips use wired ethernet, less collisions and lost packets. A much more stable connection is attained when doing this.
  • Consider installing NSCD sudo apt install nscd, it caches domain name lookups and closes off another avenue of errors.
  • If running ethernet is infeasible, use a VPS.

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

@vxbinaca Will try adding video ID.

Here's another problem I had:

  • I tried to archive just one video by running "tubeup videoid" - which is a video that had been previously downloaded when I downloaded the whole channel, and the archive process didn't work.
  • It didn't work.
  • I removed that video id from .ytdlarchive and it still didn't work...

"It sounds like a lot of videos on this channel are copyright blocked in certain countries"

I previously said that I also had the same problem when archiving different channel who has no copyright blocks.

I'm not using wifi. I'm using putty ssh which connects to a linux VPS in data center.

Next, run rm ~/.tubeup/downloads/*
It's 557 GB ... Isn't there any other way ? I don't have any bandwidth limit, but still it took a lot of time to DL all that..

I have also reinstalled all the components of tubeup and python. Sometimes it fixed some archiving problems, but not anymore.

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

@vxbinaca I will try these methods tomorrow.

But with 557 gigs, holy shit are you sure Archive.org isn't throttling you?

So far so good, maybe they have changed the limits.. I've been archiving various stuff using tubeup since December 5th..

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

@vxbinaca I don't know how to do it, but the good way to check if it's been uploaded, would be to check if https://archive.org/details/youtube-VIDEOID exists and matches with the ID that's in .ytdlarchive file.

I guess one way would be to use excel to merge .ytdlarchive file lines with archive.org url and use website checker to see if page shows 404 error.
After that export all non-404 IDS to the .ytdlarchive file

I am now manually checking those IDs by 100s on notepad++ adding bookmark and checking every 10th if it doesn't exist I add another bookmark and so on until I found ID that exists on archive.org.

All the bookmarked lines I remove after, this way I have reduced from 5 k to 2 k lines in .ytdlarchive..

I'm using headmaster SEO to check if the item already exists.

image

image

Gonna try to export as CSV after and paste back into .ytdlarchive file

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

Just go to your user details page, look at the last successful upload that streams video, get it's video ID from the item identifier, find it in your archive file then just delete everything after that line.

It's janky but a quick way to make sure you don't re-upload things already up.

I've been doing 6 gig copies of live streams that are 8 hours long (1080p too) from YT to Archive. What in tarnation is 577 gigs out of curiosity?

Yeah leaving everything in your downloads folder just gets the videos re-added to the archive. It's a pain in the ass but nothing like manually creating items.

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

last successful upload

It's not just one channel, it's like 7 channels and 3 soundcloud accounts..

And I have archived partly some channels successfully at the beginning, then something happened and nothing worked for a few days. I started archiving other channels after the ones who failed... My whole list is messed up, it's not just from one uploader, there are 140 IDS from uploader X1, 700 from uploader X2 and again uploader X1 and so on + soundcloud..

What in tarnation is 577 gigs out of curiosity?

Various non-mainstream news videos with commentary + news segments. 2 k video podcasts, etc..

Yeah leaving everything in your downloads folder just gets the videos re-added to the archive

That sucks.. Too bad there's no option to delete account with all items. I would want start again..

EDIT:

Yeah, just tried to launch archiving with one particular channel and it just stopped after 1 video and I see that .ytdlarchive file was growing in size, it wrote +200 video IDs but nothing happened, so I closed the SSH.. Gonna remove some videos and re-download 500 GB (I hope YT don't ban me)... I actually see that by using IPv6 with youtube it kind of bypasses the ban youtube put on all OVH servers previously..

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

I was re-downloading everything and trying to run archiving on channel in one batch, but it wasn't successful because error cancelled everything..

:: Upload Finished. Item information:
Title: aaaa
Upload URL: http://archive.org/details/youtube-aaaa
:: Uploading /root/.tubeup/downloads/aaa...
2016-12-19 00:39:51,374 - internetarchive.item - INFO - uploaded aaa-aaa.annotations.xml to https://s3.us.archive.org/youtube-aaa/aaa.annotations.xml
2016-12-19 00:39:51,374 - internetarchive.item - INFO - aaaa.annotations.xml successfully uploaded to https://archive.org/download/youtube-aaa/aaa.annotations.xml and verified, deleting local copy
2016-12-19 00:39:56,824 - internetarchive.item - INFO - uploaded aaa.info.json to https://s3.us.archive.org/youtube-aaa/aa.info.json
2016-12-19 00:39:56,825 - internetarchive.item - INFO - aa.info.json successfully uploaded to https://archive.org/download/youtube-aa/aaaa.info.json and verified, deleting local copy
2016-12-19 00:40:07,525 - internetarchive.item - ERROR -  error uploading aaa.f251.webm.part to youtube-aaa, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 625, in upload_file
    response.raise_for_status()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 893, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Data for url: https://s3.us.archive.org/youtube-aaa/aaa.f251.webm.part

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tubeup", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 272, in main
    identifier, meta = upload_ia(video, custom_meta=md)
  File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 219, in upload_ia
    item.upload(vid_files, metadata=meta, retries=30000, request_kwargs=dict(timeout=30000), delete=True)
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 751, in upload
    request_kwargs=request_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 645, in upload_file
    raise type(exc)(error_msg, response=exc.response, request=exc.request)
requests.exceptions.HTTPError:  error uploading aaaa.f251.webm.part to youtube-aaaaa, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm
root@aaaa:~#

error:

error uploading videoid.f251.webm.part to youtube-videoid, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm

And running tubeup on the same channel doesn't work anymore.. Now I again have to remove 50 GBs and re-download to successfully archive the whole channel.


root@aaaa:~# tubeup https://www.youtube.com/channel/channelid
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.15
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10-yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0.2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:aaaa
root@aaa:~#

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

You might want to read this: https://archive.org/help/derivatives.php

from tubeup.

rudolphos avatar rudolphos commented on June 18, 2024

@vxbinaca Is there a way to skip .webm ? Because of this one problem I am unable to upload the rest of ~400 videos.. It just stops with .webm error

And it's weird that youtube-dl downloads videos as webms, it usually merges into mp4 or leaves default youtube mkv format..

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

@rudolphos Problem downloading or uploading the webm? If the video stream only is webm, then the muxed format should be followed by deletion of the webm video-only stream.

If the muxed video and audio are webm, then the uploads should work. I've uploaded in webm. Where is it failing?

from tubeup.

vxbinaca avatar vxbinaca commented on June 18, 2024

@rudolphos Did you resolve this? I have a solution (no code update) if you haven't come up with anything. Otherwise I'm closing.

from tubeup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.