Comments (19)
@DuckHP
I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?
This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item. As long as the annotation collection files are properly named and the Internet Archive has those files, the problem of matching annotation files from a collective bundle and future ripped video files can be pushed off into the future.
from tubeup.
I would estimate there are about 10-15 billion videos on YouTube.
Tom Scott did a great oner video on Youtube video IDs. It shows why they're hard to obtain, unlike Vimeo (or Fetlife profile IDs). They probably got a lot, but not all. Calling for annotations gets some metadata, a tiny amount. Card can be implemented later or maybe they already are.
from tubeup.
I honestly doubt they got all annotations on the site. It's just too large, and the ID system too complicated. Maybe if you had a ton of Warrior bots chugging away at giant blocks of them for months on end - and I mean thousands of machines doing nothing but looking up video IDs, maybe you'd get a lot of it. i just don't see with how things are laid out on that site how you get even most of them.
from tubeup.
In response to call for PRs, propose merging #81 prior to shipping this.
from tubeup.
Having WARCd a few annotations XMLs already; I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?
(what i've uploaded so far)
https://archive.org/details/data-YouTube-Annotations-yt_anot_urls_nodupcheck.txt-2018-12-02-a354fb31
https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01
https://archive.org/details/data-anotids-yt_anot_links_continue00.txt-2018-12-11-5657e75f
(i have more underway locally, and probably/hopefully some others from ArchiveTeam have as well)
from tubeup.
They're deleting the annotations on the 19th. Theres no reason to keep the flag around since it's Youtube specific and offers no functionality anywhere else Youtube-dl supports.
from tubeup.
I also wanna add that even before I touched Tubeup 3 years ago - because it's like a lifeform that evolved - annotations were collected. So essentially as far as I can tell for it's entire existence through iterations, annotations were collected. So anyone who used it got any possible annotations - not withstanding weird upload bugs like we have that aren't the S3 issue.
from tubeup.
This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item.
Perhaps that is the best. Either way, as far as i know, IA's player does not play back annotations, YET. But hey, that's a future thing, as long as the annotations are there.
from tubeup.
Thinking about it, are you sure that the annotations flag and format is not used by youtube-dl for other services such as niconico or bilibili? Or do those use rich srt subtitles instead of YouTube's old XML? Someone needs to try it out but if there is nothing else that uses it, it has probably served it's purpose.
from tubeup.
A fair point Antonizoon. NicoNico is walled off so testing is difficult. Billibilli works however. Get a link with annotations and I'll test.
from tubeup.
End cards are now collected as annotations. Even videos without annotations or title cards have metadata collected. Closing because this would not collect valuable metadata.
Merry Christmas.
from tubeup.
@vxbinaca This could be revisited I think. Annotations that were in Youtube have been archived in the Internet Archive, and the Annotations API is returning an empty response for videos that used to have legacy annotations.
from tubeup.
What's your source that the annotations were archived? The ENTIRE sites annotations were gotten?
from tubeup.
I updated my comment with citations, and am obtaining independent verification.
from tubeup.
Legacy annotations are no longer available from YouTube. There is currently a temporary API to pull them from our archive until everything has been uploaded to IA. Side-by-side: YouTube version vs. archived version.
Once everything has been uploaded to IA, I'm planning on adding /api/v1/annotations/:id
to Invidious to better support playback in iv-org/invidious#303. I would expect it to redirect to the IA archive or fallback on YouTube if it wasn't archived.
To my knowledge, end cards are provided as a separate endpoint. End cards are not the same as cards.
Cards are still provided at the same endpoint as legacy annotations (/annotations_invideo?video_id=
), so to my knowledge it would still be possible to pull valuable metadata.
It's incredibly unlikely that you'll randomly stumble upon a valid ID (1 in 64^11). We instead pulled videos from the "recommended" bar, videos from any discovered channels, videos from any discovered playlists, and searched already archived annotation data. We archived annotations from around 1.4 billion videos.
Hopefully that is helpful, sorry if I wasn't able to respond to everything but please feel free to ask questions or for clarification.
from tubeup.
from tubeup.
I need to impliment card/end card ingestion then if it's available in youtube-dl.
Brandon, /r/datahoarder is fine sometimes. Then theres times like this where it's not fine, and it's rank amateurs with big storage (but small other things) telling me whats what.
from tubeup.
Youtube-dl doesn't appear to be interested in annotations support for other sites, and it's not currently breaking things. So I'm going to let it be for now.
from tubeup.
youtube-dl isn't necessarily against annotation support for sites such as niconico, it's just not a priority. End card support would be appreciated (but I have no idea what endpoint the end cards are served over).
from tubeup.
Related Issues (20)
- Bug report: Rate limiting is not implemented HOT 1
- Bug report: Twitch chat at some point stopped being downloaded (RECHAT) HOT 8
- Limit video download resolution to Full HD HOT 1
- ERROR: Unable to extract uploader id HOT 5
- Proposal: Identify core/essential metadata and add upload safeties for missing MD HOT 4
- Bug report: Channels having YouTube shorts cause Tubeup to fail HOT 4
- Proposal: What to do about yt-dlps new nightly branch? HOT 4
- deleted HOT 1
- Bug report/feature request: Continue downloading other videos when one fails with a permanent error HOT 6
- Bug report: extremely slow downloads from youtube HOT 3
- Bug report: [native] nsig extraction failed HOT 4
- Possible NSIG fixes HOT 8
- Upgrade yt-dlp ASAP to at least 2023.07.06 HOT 5
- "Creator" field for Douyin needs update HOT 9
- Update internetarchive to 3.4.0/3.5.0? HOT 4
- Uploaded YT video thumbnails in .webp are not used for IA item tiles HOT 7
- PEP 668 compatability
- Add new release for 2023-08-10. HOT 2
- Bug report: Video impossible to upload when best quality stream is unavailable on the server-side HOT 11
- Bug report: Unable to archive Youtube video after premiere HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tubeup.