Comments (17)
If you don't wipe the downloads, it will simply add them back to the archive file. It sucks but here's how I handle large channels ex: Healthy Addict.
Try printing video IDs using youtube-dl with the --get-id
flag. The channels IDs will be in a handle list.
youtube-dl --ignnore-config --get-id https://www.youtube.com/user/healthyaddict > target-IDs.txt
(optionally you can use xargs to clean the output of youtube-dl to make the IDs in a list)
Take the channel in chunks of 10 videos to start, if that runs well do 20 and do on. When you complete a block of uploads, remove it from the text file until empty. But with 557 gigs, holy shit are you sure Archive.org isn't throttling you? They have S3 limits.
A word on copyright takedowns:
YouTube said: This video contains content from FranceTV mcn, who has blocked it on copyright grounds.
This means you can't get the video in any region while blocked it in your country
means
you can use a VPS or VPN to get the video.
If the videos were subsequently blocked after you ripped them and still have them in the downloads folder....eh....better make an item for all the files and upload the entire directory using internetarchive
. But if you can re-rip the channel using tubeup, do that.
from tubeup.
@vxbinaca Maybe we should try to catch ExtractorErrors so we can note them down then continue. This way the script can state which urls failed at the end, instead of just crashing.
from tubeup.
@antonizoon @rudolphos Would '--ignore-errors' on youtube-dl side fix this?
from tubeup.
@rudolphos I would suggest hand editing the archive file to delete everything after the last successful upload. Although if there was an error for copyright, that particular video ID isn't archived. Anything after it, it will get.
from tubeup.
hand editing the archive file to delete everything after the last successful upload
There are multiple archive files, like 80 or more..
By delete everything you mean 1 video file and those other types of subtitles and jpegs ?
Just tried it on the latest uploaded item, but nothing happened. Tried it 2 times.
root@xxxxx:~# tubeup https://www.youtube.com/user/xxxx
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.12
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10- yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0. 2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:xxx
root@xxxx:~# tubeup https://www.youtube.com/user/xxxx
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.12
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10- yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0. 2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:xx
root@xxxxx:~#
This user didn't had any copyright errors. But the archiving process is just not working anymore, videos are 90% downloaded but like 40% of them uploaded. Basically nothing is working anymore when I ran tubeup with this particular channel.
tubeup folder has more video files than the total uploaded, so the ones that haven't been uploaded are not removed, but also not uploaded ...
from tubeup.
It sounds like a lot of videos on this channel are copyright blocked in certain countries. Try using a VPS/VPN?
from tubeup.
I mean, go into the archive file that record your progress ( ~/.tubeup/.ytdlatchive ), go look at your Archive.org uploads, find the last successful complete upload:
https://archive.org/details/youtube-8HAMrXfTJng
Okay take the video ID, that's in the item identifier, open ~/.tubeup/.ytdlatchive
in a text editor. Hit CTRL-F, find the line with 8HAMrXfTJng
. Delete everything below that line.
Next, run rm ~/.tubeup/downloads/*
Now re-run your rip. It should pick back up after the last successful upload.
Some tips:
- Don't use wifi to do the rips use wired ethernet, less collisions and lost packets. A much more stable connection is attained when doing this.
- Consider installing NSCD
sudo apt install nscd
, it caches domain name lookups and closes off another avenue of errors. - If running ethernet is infeasible, use a VPS.
from tubeup.
@vxbinaca Will try adding video ID.
Here's another problem I had:
- I tried to archive just one video by running "
tubeup videoid
" - which is a video that had been previously downloaded when I downloaded the whole channel, and the archive process didn't work. - It didn't work.
- I removed that video id from
.ytdlarchive
and it still didn't work...
"It sounds like a lot of videos on this channel are copyright blocked in certain countries"
I previously said that I also had the same problem when archiving different channel who has no copyright blocks.
I'm not using wifi. I'm using putty ssh which connects to a linux VPS in data center.
Next, run rm ~/.tubeup/downloads/*
It's 557 GB ... Isn't there any other way ? I don't have any bandwidth limit, but still it took a lot of time to DL all that..
I have also reinstalled all the components of tubeup and python. Sometimes it fixed some archiving problems, but not anymore.
from tubeup.
@vxbinaca I will try these methods tomorrow.
But with 557 gigs, holy shit are you sure Archive.org isn't throttling you?
So far so good, maybe they have changed the limits.. I've been archiving various stuff using tubeup since December 5th..
from tubeup.
@vxbinaca I don't know how to do it, but the good way to check if it's been uploaded, would be to check if https://archive.org/details/youtube-VIDEOID
exists and matches with the ID that's in .ytdlarchive
file.
I guess one way would be to use excel to merge .ytdlarchive
file lines with archive.org
url and use website checker to see if page shows 404 error.
After that export all non-404 IDS to the .ytdlarchive
file
I am now manually checking those IDs by 100s on notepad++ adding bookmark and checking every 10th if it doesn't exist I add another bookmark and so on until I found ID that exists on archive.org.
All the bookmarked lines I remove after, this way I have reduced from 5 k to 2 k lines in .ytdlarchive..
I'm using headmaster SEO to check if the item already exists.
Gonna try to export as CSV after and paste back into .ytdlarchive file
from tubeup.
Just go to your user details page, look at the last successful upload that streams video, get it's video ID from the item identifier, find it in your archive file then just delete everything after that line.
It's janky but a quick way to make sure you don't re-upload things already up.
I've been doing 6 gig copies of live streams that are 8 hours long (1080p too) from YT to Archive. What in tarnation is 577 gigs out of curiosity?
Yeah leaving everything in your downloads folder just gets the videos re-added to the archive. It's a pain in the ass but nothing like manually creating items.
from tubeup.
last successful upload
It's not just one channel, it's like 7 channels and 3 soundcloud accounts..
And I have archived partly some channels successfully at the beginning, then something happened and nothing worked for a few days. I started archiving other channels after the ones who failed... My whole list is messed up, it's not just from one uploader, there are 140 IDS from uploader X1, 700 from uploader X2 and again uploader X1 and so on + soundcloud..
What in tarnation is 577 gigs out of curiosity?
Various non-mainstream news videos with commentary + news segments. 2 k video podcasts, etc..
Yeah leaving everything in your downloads folder just gets the videos re-added to the archive
That sucks.. Too bad there's no option to delete account with all items. I would want start again..
EDIT:
Yeah, just tried to launch archiving with one particular channel and it just stopped after 1 video and I see that .ytdlarchive file was growing in size, it wrote +200 video IDs but nothing happened, so I closed the SSH.. Gonna remove some videos and re-download 500 GB (I hope YT don't ban me)... I actually see that by using IPv6 with youtube it kind of bypasses the ban youtube put on all OVH servers previously..
from tubeup.
I was re-downloading everything and trying to run archiving on channel in one batch, but it wasn't successful because error cancelled everything..
:: Upload Finished. Item information:
Title: aaaa
Upload URL: http://archive.org/details/youtube-aaaa
:: Uploading /root/.tubeup/downloads/aaa...
2016-12-19 00:39:51,374 - internetarchive.item - INFO - uploaded aaa-aaa.annotations.xml to https://s3.us.archive.org/youtube-aaa/aaa.annotations.xml
2016-12-19 00:39:51,374 - internetarchive.item - INFO - aaaa.annotations.xml successfully uploaded to https://archive.org/download/youtube-aaa/aaa.annotations.xml and verified, deleting local copy
2016-12-19 00:39:56,824 - internetarchive.item - INFO - uploaded aaa.info.json to https://s3.us.archive.org/youtube-aaa/aa.info.json
2016-12-19 00:39:56,825 - internetarchive.item - INFO - aa.info.json successfully uploaded to https://archive.org/download/youtube-aa/aaaa.info.json and verified, deleting local copy
2016-12-19 00:40:07,525 - internetarchive.item - ERROR - error uploading aaa.f251.webm.part to youtube-aaa, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 625, in upload_file
response.raise_for_status()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 893, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Data for url: https://s3.us.archive.org/youtube-aaa/aaa.f251.webm.part
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/tubeup", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 272, in main
identifier, meta = upload_ia(video, custom_meta=md)
File "/usr/local/lib/python3.5/dist-packages/tubeup/__main__.py", line 219, in upload_ia
item.upload(vid_files, metadata=meta, retries=30000, request_kwargs=dict(timeout=30000), delete=True)
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 751, in upload
request_kwargs=request_kwargs)
File "/usr/local/lib/python3.5/dist-packages/internetarchive/item.py", line 645, in upload_file
raise type(exc)(error_msg, response=exc.response, request=exc.request)
requests.exceptions.HTTPError: error uploading aaaa.f251.webm.part to youtube-aaaaa, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm
root@aaaa:~#
error:
error uploading videoid.f251.webm.part to youtube-videoid, Uploaded content is unacceptable. - video file has improper extension, try one of these: .webm
And running tubeup on the same channel doesn't work anymore.. Now I again have to remove 50 GBs and re-download to successfully archive the whole channel.
root@aaaa:~# tubeup https://www.youtube.com/channel/channelid
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.12.15
[debug] Python version 3.5.2+ - Linux-4.8.0-30-generic-x86_64-with-Ubuntu-16.10-yakkety
[debug] exe versions: avconv 3.0.2-1ubuntu3, avprobe 3.0.2-1ubuntu3, ffmpeg 3.0.2-1ubuntu3, ffprobe 3.0.2-1ubuntu3
[debug] Proxy map: {}
[debug] Public IP address: 2001:aaaa
root@aaa:~#
from tubeup.
You might want to read this: https://archive.org/help/derivatives.php
from tubeup.
@vxbinaca Is there a way to skip .webm ? Because of this one problem I am unable to upload the rest of ~400 videos.. It just stops with .webm error
And it's weird that youtube-dl downloads videos as webms, it usually merges into mp4 or leaves default youtube mkv format..
from tubeup.
@rudolphos Problem downloading or uploading the webm? If the video stream only is webm, then the muxed format should be followed by deletion of the webm video-only stream.
If the muxed video and audio are webm, then the uploads should work. I've uploaded in webm. Where is it failing?
from tubeup.
@rudolphos Did you resolve this? I have a solution (no code update) if you haven't come up with anything. Otherwise I'm closing.
from tubeup.
Related Issues (20)
- Bug report: Rate limiting is not implemented HOT 1
- Bug report: Twitch chat at some point stopped being downloaded (RECHAT) HOT 8
- Limit video download resolution to Full HD HOT 1
- ERROR: Unable to extract uploader id HOT 5
- Proposal: Identify core/essential metadata and add upload safeties for missing MD HOT 4
- Bug report: Channels having YouTube shorts cause Tubeup to fail HOT 4
- Proposal: What to do about yt-dlps new nightly branch? HOT 4
- deleted HOT 1
- Bug report/feature request: Continue downloading other videos when one fails with a permanent error HOT 6
- Bug report: extremely slow downloads from youtube HOT 3
- Bug report: [native] nsig extraction failed HOT 4
- Possible NSIG fixes HOT 8
- Upgrade yt-dlp ASAP to at least 2023.07.06 HOT 5
- "Creator" field for Douyin needs update HOT 9
- Update internetarchive to 3.4.0/3.5.0? HOT 4
- Uploaded YT video thumbnails in .webp are not used for IA item tiles HOT 7
- PEP 668 compatability
- Add new release for 2023-08-10. HOT 2
- Bug report: Video impossible to upload when best quality stream is unavailable on the server-side HOT 11
- Bug report: Unable to archive Youtube video after premiere HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tubeup.