Git Product home page Git Product logo

segcapnet's Introduction

Abstract

This project is an Image Search System that employs advanced image segmentation and caption generation techniques. It enables users to upload images and receive relevant product suggestions based on similarities in the generated captions.

Installation

To install the project, follow these steps:

  1. Clone the repository:

  2. Navigate to the project directory:

  3. Install dependencies:

pip install -r requirements.txt
  1. Run the backend.py server:
python backend.py
  1. Run the frontend interface:
gradio frontend.py

After running frontend, upload image of the product you want to search for and click on search button. The model will generate the caption for the image, then copy the image and paste it in the search bar and click on search button. The model will search for the similar products and display the results.

Tech Stack

  • Python
  • Gradio
  • Segmentation Models
  • Salesforce BLIP
  • TensorFlow
  • Hugging Face Transformers

Optimizations

The development methodology of this image search system is comprehensive and involves several key steps:

  1. Image Segmentation: The system begins with the implementation of an image segmentation model using the U2-Net architecture. This model is trained on a dataset containing images and their corresponding masks, allowing it to segment the images effectively.

  2. Caption Generation: Following successful segmentation, the system utilizes the Salesforce BLIP Transformer for caption generation. The BLIP model is fine-tuned on a fashion product dataset consisting of segmented images paired with captions. This process leverages the learned representations from both image and text modalities to generate descriptive captions for segmented images.

  3. Integration with Gradio: Gradio, a user-friendly library for building web-based applications with machine learning models, is integrated into the system. An intuitive user interface is designed to allow users to upload images for segmentation and caption generation. On the backend, functionality is developed to process uploaded images, perform segmentation using the trained U-Net model, and generate captions using the fine-tuned BLIP model. Real-time inference capabilities ensure prompt feedback to users upon image submission.

  4. Product Search Mechanism: Once the system is capable of segmenting images and generating captions in real-time, a mechanism for searching similar products based on the generated captions is implemented. This involves tokenizing the captions to extract meaningful tokens representing product attributes. These tokens are then transformed into feature representations, and a similarity search mechanism is implemented to retrieve products with similar features.

File Structure

.
├── README.md
├── backend.py
├── bg.py
├── cmd
│   ├── cli.py
│   └── server.py
├── final.ipynb
├── flagged
├── frontend.py
├── github.py
├── models
│   ├── u2aa
│   ├── u2ab
│   ├── u2ac
│   ├── u2ad
│   ├── u2haa
│   ├── u2hab
│   ├── u2hac
│   ├── u2had
│   └── u2netp.pth
├── out.txt
├── requirements.txt
├── u2net
│   ├── data_loader.py
│   ├── detect.py
│   └── u2net.py
└── utilities.py

4 directories, 23 files

Assets

References

Authors

  • Adarsh Anand
  • Aniket Chaudhari
  • Rajat Singh
  • Vivek Bandrele

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.