Git Product home page Git Product logo

tokenize-anything's Introduction

Tokenize Anything via Prompting

Ting Pan1,2*,   Lulu Tang2*,   Xinlong Wang,   Shiguang Shan1

1ICT-CAS,   2BAAI
* Equal Contribution, Project Lead

[Paper] [🤗 Demo]

We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.

Installation

Preliminaries

torch

flash-attn >= 2.3.3 (Install the pre-built wheel distribution from URL)

gradio-image-prompter (for GradioApp, Install from URL)

Installing Package

Clone this repository to local disk and install:

cd tokenize-anything && pip install .

You can also install from the remote repository:

pip install git+ssh://[email protected]/baaivision/tokenize-anything.git

Quick Start

Development

The TAP models can be used for diverse vision and language tasks.

We adopt a modular design that decouples all components and predictors.

As a best practice, implement your custom predictor and asynchronous pipeline as follows:

from tokenize_anything import model_registry

with <distributed_actor>:
    model = model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
    results = <custom_predictor>(model, *args, **kwargs)

server.collect_results()

See builtin examples (web-demo and evaluations) provided in scripts for more details.

Inference

See Inference Guide.

See Concept Guide.

Evaluation

See Evaluation Guide for TAP-L.

See Evaluation Guide for TAP-B.

Models

Model weights

Two versions of the model are available with different image encoders.

Model Description MD5 Weights
tap_vit_l ViT-L TAP model 03f8ec 🤗 HF link
tap_vit_b ViT-B TAP model b45cbf 🤗 HF link

Concept weights

Note: You can generate these weights following the Concept Guide.

Concept Description Weights
Merged-2560 Merged concepts 🤗 HF link
LVIS-1203 LVIS concepts 🤗 HF link
COCO-80 COCO concepts 🤗 HF link

Contact

  • We are looking for research interns at BAAI Vision Team. If you are interested in working with us on Vision Foundation Models (e.g., SAM variants), please contact Xinlong Wang ([email protected]).

License

Apache License 2.0

Citation

@article{pan2023tap,
  title={Tokenize Anything via Prompting},
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
  journal={arXiv preprint arXiv:2312.09128},
  year={2023}
}

Acknowledgement

We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.

tokenize-anything's People

Contributors

physcalx avatar eltociear avatar julie-tang00 avatar wxinlong avatar

Stargazers

Jimmy Liu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.