Git Product home page Git Product logo

res's Introduction

ReS dataset of "Repositioning The Subject Within Image"

[preprint][intro][demo:Youtube,Bilibili]

teaser

Overview

This repo contains the proposed dataset ReS in our paper "Repositioning The Subject Within Image" .

Subject repositioning aims to relocate a user-specified subject within a single image. Our proposed SEELE effectively addresses the generative sub-tasks within a unified prompt-guided inpainting task, all powered by a single diffusion generative model.

We curated a benchmark dataset called ReS. This dataset includes 100 paired images, featuring a repositioned subject while the other elements remain constant. These images were collected from over 20 indoor and outdoor scenes, showcasing subjects from more than 50 categories. This variety enables effective simulation of real-world open-vocabulary applications.

Download

The Res Dataset is available at Google Drive, Baidu Netdisk.

Structure

Unzip the file, and you will get a folder including:

pi_1.jpg # The first view of the scene i
pi_2.jpg # The second view of the scene i
pi_1_mask.png # The visiable mask of subject in the first view
pi_1_amodal.png # The full mask of subject in the first view
pi_2_mask.png # The visiable mask of subject in the second view
pi_2_amodal.png # The full mask of subject in the second view

The images were taken using two different mobile devices. Some are sized 1702x1276, while others are 4032x3024. Each pair has the same resolution.

The masks corresponding to these images are annotated based on SAM, with a maximum length of 1024.

Loading

We provide an example script Res.py for loading the ReS dataset.

In the script, we define a class ReS that is initialized with:

res = ReS(root_dir, img_size, load_square)

The first parameter is the folder path, the img_size is the minimum side length you want. If you set load_square to true, the images will be resized as square images.

Paired images represent two tasks in this context, with each task starting from one side. If an image is occluded, we only use it as the source image.

The __getitem__ function processes a specific task and outputs a dict with

'image': the source image
'mask': the remove mask of the subject in the source location
'gt': the target image
'amodal': the complete mask of the subject in the target location
'size': resolution of the image
'masked_image': masked image

We assume the results are inputed to the SD. Please adjust the function as needed for your convenience.

Intended Uses

The data are intended for research purposes to advance the progess of subject repositioning.

Limiatations

Due to the perspective shift, the size and the view of the subject after repositioning will change. We don't provide annotations for this, so using the target image directly for quantitative analysis may not be accurate.

Citation

If you found the provided dataset useful, please cite our work.

@article{wang2024repositioning,
  title={Repositioning the Subject within Image},
  author={Wang, Yikai and Cao, Chenjie and Dong, Qiaole and Li, Yifan and Fu, Yanwei},
  journal={arXiv preprint arXiv:2401.16861},
  year={2024}
}

res's People

Contributors

yikai-wang avatar

Stargazers

Pranav Kumar avatar Edge Micro avatar BaiLing avatar Zijin Yin avatar  avatar jkang avatar Ameer Azam avatar Said avatar Stéphane Monté avatar

Watchers

Yuxiang Wei avatar  avatar

Forkers

zzzyzh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.