Git Product home page Git Product logo

Comments (6)

sagadre avatar sagadre commented on July 24, 2024 1

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

from datacomp.

sagadre avatar sagadre commented on July 24, 2024 1

Hi @zwsjink I agree with your conclusion! And great point about the small differences between B/32 and L/14 features for filtering and clustering. We were indeed surprised that using a stronger CLIP backbone for clustering/filtering did not lead to large gains in downstream performance.

This brings up interesting questions related to what makes a good dataset filtering model, which at least from this comparison, seems slightly different than what makes a good zero-shot model.

from datacomp.

sagadre avatar sagadre commented on July 24, 2024 1

While the difference between clip score filtering and image-based filtering is within a percentage point (pp), we found it interesting that “stacking” these filtering methods produce a substantial gain (approx. 3pp). There is definitely more here to understand (i.e., when one should stack filtering methods).

We experimented with IN1k and IN21k for image-based filtering just because these are common datasets and seemed like reasonable baselines. Investing additional datasets for filtering is also an interesting direction!

from datacomp.

zwsjink avatar zwsjink commented on July 24, 2024

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

Thanks for the clarification. Yeah, I do see in clip-score based filter step, both l14 and b32 are supported. I suppose you guys have tried image-based filtering on both L14 and B32 embeddings and find that L14 outperform B32?

from datacomp.

zwsjink avatar zwsjink commented on July 24, 2024

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

Thanks for the clarification. Yeah, I do see in clip-score based filter step, both l14 and b32 are supported. I suppose you guys have tried image-based filtering on both L14 and B32 embeddings and find that L14 outperform B32?

Just want to add my observation here, by going through the data provided in Table21-24 in the paper, there is not too much benefit when moving from B32 to L14 in clip score thresholding, at least in small/medium/large . So that's why I'm wondering if it worth using L14 for both clip score thresholding and image-based filtering. Probably, B32 is enough for me to get an acceptable accuracy and more compute/storage resource friendly

from datacomp.

zwsjink avatar zwsjink commented on July 24, 2024

@sagadre I also spot that compared with clip-score thresholding, image-based filter does not bring too much benefit.
image
is there a reason for you guys using Imagenet to do a relevance filter here?

from datacomp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.