Git Product home page Git Product logo

foreground-object-search-dataset-fosd's Introduction

Foreground Object Search Dataset FOSD

This is the official repository for the following paper:

Foreground Object Search by Distilling Composite Image Feature [arXiv]

Bo Zhang, Jiacheng Sui, Li Niu
Accepted by ICCV 2023.

Our model has been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/

Requirements

  • See requirements.txt for other dependencies.

Data Preparing

  • Download Open-Images-v6 trainset from Open Images V6 - Download and unzip them. We recommend that you use FiftyOne to download the Open-Images-v6 dataset. After the dataset is downloaded, the data structure of Open-Images-v6 dataset should be as follows.

    Open-Images-v6
    ├── metadata
    ├── train
    │   ├── data
    │   │   ├── xxx.jpg
    │   │   ├── xxx.jpg
    │   │   ...
    │   │
    │   └── labels
    │       └── masks
    │       │   ├── 0
    │       │       ├── xxx.png
    │       │       ├── xxx.png
    │       │       ...
    │       │   ├── 1
    │       │   ...
    │       │
    │       ├── segmentations.csv
    │       ...
    
  • Download S-FOSD annotations, R-FOSD annotations and background images of R-FOSD from Baidu disk (code: 3wvf) and save them to the appropriate location under the data directory according to the data structure below.

  • Generate backgrounds and foregrounds.

    python prepare_data/fetch_data.py --open_images_dir <path/to/open/images>
    

The data structure is like this:

data
├── metadata
│   ├── classes.csv
│   └── category_embeddings.pkl
├── test
│   ├── bg_set1
│   │   ├── xxx.jpg
│   │   ├── xxx.jpg
│   │   ...
│   │
│   ├── bg_set2
│   │   ├── xxx.jpg
│   │   ├── xxx.jpg
│   │   ...
│   │
│   ├── fg
│   │   ├── xxx.jpg
│   │   ├── xxx.jpg
│   │   ...
│   └── labels
│       └── masks
│       │   ├── 0
│       │       ├── xxx.png
│       │       ├── xxx.png
│       │       ...
│       │   ├── 1
│       │   ...
│       │
│       ├── test_set1.json
│       ├── test_set2.json
│       └── segmentations.csv
│
└── train
    ├── bg
    │   ├── xxx.jpg
    │   ├── xxx.jpg
    │   ...
    │
    ├── fg
    │   ├── xxx.jpg
    │   ├── xxx.jpg
    │   ...
    │
    └── labels
        └── masks
        │   ├── 0
        │       ├── xxx.png
        │       ├── xxx.png
        │       ...
        │   ├── 1
        │   ...
        │
        ├── train_sfosd.json
        ├── train_rfosd.json
        ├── category.json
        ├── number_per_category.csv
        └── segmentations.csv

Pretrained Model

We provide the checkpoint (Baidu disk code: 7793) for the evaluation on S-FOSD dataset and checkpoint (Baidu disk code: 6kme) for testing on R-FOSD dataset. By default, we assume that the pretrained model is downloaded and saved to the directory checkpoints.

Testing

Evaluation on S-FOSD Dataset

python evaluate/evaluate.py --testOnSet1

Evaluation on R-FOSD Dataset

python evaluate/evaluate.py --testOnSet2

The evaluation results will be stored to the directory eval_results.

If you want to save top 20 results on R-FOSD, add --saveTop20 parameter. The top 20 results on R-FOSD will be stored to the directory top20 by default.

If you want to save the model's prediction scores on R-FOSD, add --saveScores parameter. The model scores on R-FOSD will be stored to the directory model_scores by default.

Training

Please download the pretrained teacher models from Baidu disk (code: 40a5) and save the model to directory checkpoints/teacher.

To train a new sfosd model, you can simply run:

.train/train_sfosd.sh

Similarly, train a new rfosd model by:

.train/train_rfosd.sh

FOS Score

Our model can be used to evaluate the compatibility between foreground and background in terms of geometry and semantics.

To launch the demo, you can run:

python demo/demo_ui.py

Here are three steps you can take to get a compatibility score for the foreground and the background.

  1. Upload a background image in the left box of the first row

  2. Click the left-top point and the right-bottom point of the bounding box in the right box of the first row

  3. Upload a foreground image in the left box of the second row, then click 'run' button.

Other Resources

License

Both background and foreground images of S-FOSD belong to Open-Images. The background images of R-FOSD are collected from Internet and are licensed under a Creative Commons Attribution 4.0 License.

foreground-object-search-dataset-fosd's People

Contributors

charlessjc avatar ustcnewly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

yazilee

foreground-object-search-dataset-fosd's Issues

About comparative models' code?

How can I obtain the CFO, UFO, GALA, and FFR models for comparative experiments? Could you please share your code?Thanks!

error

微信图片_20231104151406
请问上面的两个错误怎么解决呢?
还有一个错误就是在路径下找不到segmentations.csv,这个文件,但是我已经放进了

一个奇怪的问题

打扰了!这个问题感觉有点弱智。
当时我通过fiftyone下载Open-Images-v6 dataset数据集,
使用的代码是:
import fiftyone as fo
dataset = fo.zoo.load_zoo_dataset("open-images-v6")
然后我下载的Open-Images-v6的数据结构与给定的数据结构不一样
image
如上图所示在Open-Images-v6的下级目录我少了一个metadata,然鹅这个metadata在train文件夹下
image
我的Open-Images-v6 dataset数据集的数据结构是这样的:
Open-Images-v6
├── train
│   ├── data
│ │ ├── xxx.jpg
│ │ ├── xxx.jpg
│ │ ...
│ │
│   └── labels
│   └── masks
│   │ ├── 0
│ │ ├── xxx.png
│ │ ├── xxx.png
│ │ ...
│   │ ├── 1
│  │ ...
│ │
│ ├── segmentations.csv
│ ...
└── metadata

     ...

所以说这个metadata文件夹要不要移动

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.