<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In image-based filtering, only imagenet L14 is supported, making extracting B32 embeddings useless about datacomp HOT 6 CLOSED

mlfoundations commented on July 24, 2024

In image-based filtering, only imagenet L14 is supported, making extracting B32 embeddings useless

from datacomp.

Comments (6)

sagadre commented on July 24, 2024 1

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

from datacomp.

sagadre commented on July 24, 2024 1

Hi @zwsjink I agree with your conclusion! And great point about the small differences between B/32 and L/14 features for filtering and clustering. We were indeed surprised that using a stronger CLIP backbone for clustering/filtering did not lead to large gains in downstream performance.

This brings up interesting questions related to what makes a good dataset filtering model, which at least from this comparison, seems slightly different than what makes a good zero-shot model.

from datacomp.

sagadre commented on July 24, 2024 1

While the difference between clip score filtering and image-based filtering is within a percentage point (pp), we found it interesting that “stacking” these filtering methods produce a substantial gain (approx. 3pp). There is definitely more here to understand (i.e., when one should stack filtering methods).

We experimented with IN1k and IN21k for image-based filtering just because these are common datasets and seemed like reasonable baselines. Investing additional datasets for filtering is also an interesting direction!

from datacomp.

zwsjink commented on July 24, 2024

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

Thanks for the clarification. Yeah, I do see in clip-score based filter step, both l14 and b32 are supported. I suppose you guys have tried image-based filtering on both L14 and B32 embeddings and find that L14 outperform B32?

from datacomp.

zwsjink commented on July 24, 2024

Hi @zwsjink! For the main image-based filtering baseline in the paper, we used L/14 features, so to replicate this baseline, only the L/14 features are needed. B/32 features are used for other baselines (e.g., clip score filtering baselines). Hope that answers the question, but feel free to let me know if not!

Thanks for the clarification. Yeah, I do see in clip-score based filter step, both l14 and b32 are supported. I suppose you guys have tried image-based filtering on both L14 and B32 embeddings and find that L14 outperform B32?

Just want to add my observation here, by going through the data provided in Table21-24 in the paper, there is not too much benefit when moving from B32 to L14 in clip score thresholding, at least in small/medium/large . So that's why I'm wondering if it worth using L14 for both clip score thresholding and image-based filtering. Probably, B32 is enough for me to get an acceptable accuracy and more compute/storage resource friendly

from datacomp.

zwsjink commented on July 24, 2024

@sagadre I also spot that compared with clip-score thresholding, image-based filter does not bring too much benefit.

is there a reason for you guys using Imagenet to do a relevance filter here?

from datacomp.

In image-based filtering, only imagenet L14 is supported, making extracting B32 embeddings useless about datacomp HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent