Comments (6)
Can you show more details of your case? Original CLIP can only input RGB 3 channel only...
from alphaclip.
Exactly, so I just disregard the alpha channel and have my background black which is not working good.
image_array = np.array(image.image)
image_array = image_array[:, :, :3] #Here I disregard the alpha channel
image_pil = Image.fromarray(image_array)
image_pil = image_pil.resize((224, 224), Image.LANCZOS)
images_to_sprite_data.append(image_pil)
image_tensor = preprocess_image(image_pil)
with torch.no_grad():
image_embedding = model.encode_image(image_tensor)
embeddings.append(image_embedding.cpu().numpy()[0])
Could I use AlphaCLIP to encode Images with an alpha channel and have its attention only on the area of interest?
This is an example from my Dataset
from alphaclip.
We cannot assure 100% attention focus, but you can give it a try with our model, or a quick test of our demo!
from alphaclip.
I've just created an embedding by following the readme.
image_features = model.visual(image, alpha)
I believe the attention is now set by the mask?
I will need to create masks for my dataset and will see whether the results are what im searching for.
from alphaclip.
Yes, if you follow README, you can add region focus by mask, which original CLIP is not capable of. You can try this on your own task to see whether it fit your requirement.
from alphaclip.
Hey I just tried the Model on my use-case and it got me very interesting results. Thank you!
from alphaclip.
Related Issues (20)
- for one image,regardless of how the alpha channel is modified,feature similarity is consistently above 0.97 (even between mask=0 and mask=1) HOT 2
- Alpha clip has reduced zero shooting ability compared to the original clip? HOT 1
- question about the alpha-clip combined with LLaVA-7b HOT 11
- AttributeError: 'NoneType' object has no attribute 'from_pretrained' HOT 1
- The Alpha-clip demo with LLAVA will constantly repeat a sentence under certain specific images. HOT 2
- ViT-H/14 Model HOT 2
- Question: Can you provide some guidance for finetuning MLLM with alpha-clip vision encoder? HOT 2
- Will you provide code for the data generation process? HOT 2
- What data enhancements were used in AlphaCLIP? HOT 3
- Could you release the code of integrating blip2 with alpha clip? HOT 4
- The magic number of 1.9231 and 6 HOT 2
- Annotations of the generated Imagenet HOT 2
- Do you consider trying Alpha-DINOv2? HOT 1
- Do you have plans to release the training code based on openclip? HOT 1
- can you provided the mask of Imagenet ? HOT 1
- Table 6: Performance of Alpha-CLIP in region level captioning HOT 1
- Poor performance on COCO dataset.
- Fail to download clip_l14_grit+mim_fultune_6xe.pth HOT 2
- Demo error HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alphaclip.