Git Product home page Git Product logo

Comments (19)

brandongalbraith avatar brandongalbraith commented on June 15, 2024 1

@DuckHP

I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?

This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item. As long as the annotation collection files are properly named and the Internet Archive has those files, the problem of matching annotation files from a collective bundle and future ripped video files can be pushed off into the future.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024 1

I would estimate there are about 10-15 billion videos on YouTube.

Tom Scott did a great oner video on Youtube video IDs. It shows why they're hard to obtain, unlike Vimeo (or Fetlife profile IDs). They probably got a lot, but not all. Calling for annotations gets some metadata, a tiny amount. Card can be implemented later or maybe they already are.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024 1

I honestly doubt they got all annotations on the site. It's just too large, and the ID system too complicated. Maybe if you had a ton of Warrior bots chugging away at giant blocks of them for months on end - and I mean thousands of machines doing nothing but looking up video IDs, maybe you'd get a lot of it. i just don't see with how things are laid out on that site how you get even most of them.

from tubeup.

brandongalbraith avatar brandongalbraith commented on June 15, 2024

In response to call for PRs, propose merging #81 prior to shipping this.

from tubeup.

 avatar commented on June 15, 2024

Having WARCd a few annotations XMLs already; I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?

(what i've uploaded so far)
https://archive.org/details/data-YouTube-Annotations-yt_anot_urls_nodupcheck.txt-2018-12-02-a354fb31
https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01
https://archive.org/details/data-anotids-yt_anot_links_continue00.txt-2018-12-11-5657e75f

(i have more underway locally, and probably/hopefully some others from ArchiveTeam have as well)

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

They're deleting the annotations on the 19th. Theres no reason to keep the flag around since it's Youtube specific and offers no functionality anywhere else Youtube-dl supports.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

I also wanna add that even before I touched Tubeup 3 years ago - because it's like a lifeform that evolved - annotations were collected. So essentially as far as I can tell for it's entire existence through iterations, annotations were collected. So anyone who used it got any possible annotations - not withstanding weird upload bugs like we have that aren't the S3 issue.

from tubeup.

 avatar commented on June 15, 2024

This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item.

Perhaps that is the best. Either way, as far as i know, IA's player does not play back annotations, YET. But hey, that's a future thing, as long as the annotations are there.

from tubeup.

antonizoon avatar antonizoon commented on June 15, 2024

Thinking about it, are you sure that the annotations flag and format is not used by youtube-dl for other services such as niconico or bilibili? Or do those use rich srt subtitles instead of YouTube's old XML? Someone needs to try it out but if there is nothing else that uses it, it has probably served it's purpose.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

A fair point Antonizoon. NicoNico is walled off so testing is difficult. Billibilli works however. Get a link with annotations and I'll test.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

End cards are now collected as annotations. Even videos without annotations or title cards have metadata collected. Closing because this would not collect valuable metadata.

Merry Christmas.

from tubeup.

brandongalbraith avatar brandongalbraith commented on June 15, 2024

@vxbinaca This could be revisited I think. Annotations that were in Youtube have been archived in the Internet Archive, and the Annotations API is returning an empty response for videos that used to have legacy annotations.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

What's your source that the annotations were archived? The ENTIRE sites annotations were gotten?

from tubeup.

brandongalbraith avatar brandongalbraith commented on June 15, 2024

I updated my comment with citations, and am obtaining independent verification.

from tubeup.

omarroth avatar omarroth commented on June 15, 2024

Legacy annotations are no longer available from YouTube. There is currently a temporary API to pull them from our archive until everything has been uploaded to IA. Side-by-side: YouTube version vs. archived version.

Once everything has been uploaded to IA, I'm planning on adding /api/v1/annotations/:id to Invidious to better support playback in iv-org/invidious#303. I would expect it to redirect to the IA archive or fallback on YouTube if it wasn't archived.

To my knowledge, end cards are provided as a separate endpoint. End cards are not the same as cards.

Cards are still provided at the same endpoint as legacy annotations (/annotations_invideo?video_id=), so to my knowledge it would still be possible to pull valuable metadata.

It's incredibly unlikely that you'll randomly stumble upon a valid ID (1 in 64^11). We instead pulled videos from the "recommended" bar, videos from any discovered channels, videos from any discovered playlists, and searched already archived annotation data. We archived annotations from around 1.4 billion videos.

Hopefully that is helpful, sorry if I wasn't able to respond to everything but please feel free to ask questions or for clarification.

from tubeup.

brandongalbraith avatar brandongalbraith commented on June 15, 2024

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

I need to impliment card/end card ingestion then if it's available in youtube-dl.

Brandon, /r/datahoarder is fine sometimes. Then theres times like this where it's not fine, and it's rank amateurs with big storage (but small other things) telling me whats what.

from tubeup.

vxbinaca avatar vxbinaca commented on June 15, 2024

Youtube-dl doesn't appear to be interested in annotations support for other sites, and it's not currently breaking things. So I'm going to let it be for now.

from tubeup.

ealgase avatar ealgase commented on June 15, 2024

youtube-dl isn't necessarily against annotation support for sites such as niconico, it's just not a priority. End card support would be appreciated (but I have no idea what endpoint the end cards are served over).

from tubeup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.