Git Product home page Git Product logo

Comments (5)

rom1504 avatar rom1504 commented on August 21, 2024

Priority queue:

from video2dataset.

iejMac avatar iejMac commented on August 21, 2024

@rom1504
Making a summary/list of what needs to be done pre-release. Going over the code as it's executed and noting my thoughts:

main.py

worker.py

  • make this cleaner. ifs are ugly and doesn't get the point across, point is that the video already has the audio so you want something like bytes_downloaded += max(streams.get("video", 0), streams.get("audio", 0))
  • clipping subsampler should take encode_formats as init param
  • we can make this nicer by doing something like broadcast_subsampler = clipping_sub if "clips" in whatever else noop_sub and then just call that with streams and meta
  • idea to get rid of this and listing out all subsamples, as we add more we shouldn't have to add another if statement in the worker loop. The idea is to initialize worker with a list of non-None f"{modality}_subsamplers" list and then just iterate over all of those. The reason the attribute would be called f"{modality}_subsamplers" is because instead of checking if we have "video" in streams we can just iterate over all the modalities in streams and retrieve the ```eval(f"self.{modality}_subsamplers")
  • format_type isn't argument to writer

data_reader.py

data_writer.py

  • think about this, do we need to be iterating over encode_formats? maybe it's enough to just iterate over streams and write nothing for cases where a modality isn't present in streams. When does this happen? Do we want to write an empty meta then?
  • writers need testing

subsamplers/clipping_subsampler.py

  • same as with noop_subsampler, encode_formats should be init param not call param
  • need to check that audio clipping works as intended though i.e. lines up with video and correct number of clips correctly ID'd etc.

tests/test_downloaders.py

tests/test_audio.py

  • should be renamed to test_reader.py and we should actually test the reader
  • actually pretty good just needs to be adapted a bit and add more parameters to test if the video is reading properly etc. we should also test that the correct error_messages are being returned (or exceptions being thrown if we decide to merge that one PR)

README.md

  • examples getting too long and unintuitive, we should just make more things in examples directory
  • specifically let's show how to use encode_formats and other params like that
  • also let's add a tutorial for how to run this with distributed=spark, I think that's not obvious but very useful.
  • maybe add citation?

besides the above cleanup there's 2 more PR's to get merged:

  • #91 - improves subtitle support and fixes a few things, 100% needs to get merged
  • #92 - I think this is worth trying and considering
  • #80 - check if it's done

v1 ideas

While going through the code I had some ideas for v1:

  • if encode_formats has both video and audio perhaps we should try to do most of the pipeline with video and audio in 1 mp4 byte stream instead of separate video and audio streams so subsampling such as clipping can be done together and then we can split it up instead of the other way

from video2dataset.

iejMac avatar iejMac commented on August 21, 2024

Delaying, we need to get some successful use case of data from this repo. Either SVD or VideoCLIP or whatever.

from video2dataset.

rom1504 avatar rom1504 commented on August 21, 2024

I would call this done

codebase are using this

from video2dataset.

iejMac avatar iejMac commented on August 21, 2024

yeah sure, maybe it would be good to update pip package if this is the case

from video2dataset.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.