Git Product home page Git Product logo

open-images-dataset's Introduction

Open Images Dataset

Open Images is a dataset of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. This page aims to provide the download instructions and mirror sites for Open Images Dataset. Please visit the project page for more details on the dataset.

Download Images

Download Images With Bounding Boxes Annotations

CVDF hosts image files that have bounding boxes annotations in the Open Images Dataset V4/V5. These images contain the complete subsets of images for which instance segmentations and visual relations are annotated. The images are split into train (1,743,042), validation (41,620), and test (125,436) sets. The train set is also used in the Open Images Challenge 2018 and 2019. The images are rescaled to have at most 1024 pixels on their longest side, while preserving their original aspect-ratio. The total size is 561GB. The images can be directly downloaded into a local directory from the CVDF AWS S3 cloud storage bucket:

s3://open-images-dataset

You can either download the images to a local directory or to your own AWS S3 cloud storage bucket with the following procedures:

  1. install awscli
  2. download images for the train set, validation set, test set:
  • aws s3 --no-sign-request sync s3://open-images-dataset/train [target_dir/train] (513GB)
  • aws s3 --no-sign-request sync s3://open-images-dataset/validation [target_dir/validation] (12GB)
  • aws s3 --no-sign-request sync s3://open-images-dataset/test [target_dir/test] (36GB)

Alternatively, you can download the subsets in separate packed files (the subset train_x contains all images with ID starting with x):

  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_0.tar.gz [target_dir] (46G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_1.tar.gz [target_dir] (34G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_2.tar.gz [target_dir] (33G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_3.tar.gz [target_dir] (32G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_4.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_5.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_6.tar.gz [target_dir] (32G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_7.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_8.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_9.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_a.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_b.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_c.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_d.tar.gz [target_dir] (31G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_e.tar.gz [target_dir] (28G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_f.tar.gz [target_dir] (28G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz [target_dir] (12G)
  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/test.tar.gz [target_dir] (36G)

The target_dir can be a local directory or a AWS S3 cloud storage bucket.

Download the Open Images Challenge 2018/2019 test set

CVDF also hosts the Open Images Challenge 2018/2019 test set, which is disjoint from the Open Images V4/V5 train, val, and test sets. The same AWS instructions above apply. Note that since the images from the 2019 challenge have not changed, the filenames only include the year 2018.

  • aws s3 --no-sign-request sync s3://open-images-dataset/challenge2018 [target_dir/test_challenge_2018] (10GB)

We also provide the zipped file for challenge 2018/2019 set. You can download the zipped file using

  • aws s3 --no-sign-request cp s3://open-images-dataset/tar/challenge2018.tar.gz [target_dir] (9.7G)

Download Full Dataset With Google Storage Transfer

Prerequisite: Google Cloud Platform account

In this section, we describe the procedures to download all images in the Open Images Dataset to a Google Cloud storage bucket. We recommend to use the user interface provided in the Google Cloud storage console for the task.

Google Storage provides a "storage transfer" function to transfer online files into a storage bucket. This function can be used to transfer images from original urls into user's storage bucket. CVDF prepares the tsv files that contain all image urls in Open Images Dataset for the transfer. The step-by-step instructions are described in Creating and Managing Transfers with the Console. The size of the whole dataset is around 18TB. Please note that user needs to pay for hosting the dataset on Google Cloud storage after downloading it. The hosting price can be found on Google Cloud Storage Pricing.

The tsv files for the train set, in 10 partitions:
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train0.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train1.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train2.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train3.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train4.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train5.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train6.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train7.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train8.tsv
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-train9.tsv

The tsv file for the validation set:
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-validation.tsv

The tsv file for the test set:
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-test.tsv

open-images-dataset's People

Contributors

akashlakhera avatar jrruijli avatar shackenberg avatar sohierdane avatar tylin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-images-dataset's Issues

Access denied to download

I try to download images using indstruction and I get error for access denial

AccessDeniedException: 403 #email# does not have storage.objects.list access to open-images-dataset.

Also, during login I get this warning:

WARNING: `gcloud auth login` no longer writes application default credentials.
If you need to use ADC, see:
  gcloud auth application-default --help

300k Classification Set?

Hey, is there any chance we could get the full dataset in 300k pixels as an option to download? The detection data is great to have and I'd like to get the classification data too but downloading all the thumbnails individually is taking forever and a lot of them are missing. Thanks!

Better to add timeout arguments for aws download

With the suggested command in README to download the tar files, I frequently encountered a "connection closed" error after several minutes of downloading. According to this issue, I add an extra argument, then everything goes well:

aws s3 --no-sign-request --cli-connect-timeout 6000 cp s3://open-images-dataset/tar/test.tar.gz .

It is better to add --cli-connect-timeout 6000 into README so other people will not encounter the "connection closed" error again.

Offensive Image

I found an offensive image at the db. The Image id is : c5f085aeca67f4cb. The image contains a Swastika symbol with Arabic scripts.
Please update me after the removal of this image.

download problem

("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
Could anyone solve this problem?

Download full dataset

I want to download the full dataset (18TB) but I don't get the access permission.
I've already submitted CVDF access request.
I created google storage transfer but I got a message of 'permission denied', Can anyone tell me the way to download the full dataset?
help me plz! #

Issue with cloud connectivity

"2019/03/11 13:06:26 Failed to copyto: googleapi: Error 403: [email protected] does not have storage.objects.get access to rclone/placehere., forbidden"

I keep getting the above message when trying to send a file from my Pi to my Google Drive.
This is my code.
rclone copyto /home/pi/Desktop/upload1 remote:rclone/placehere -P --progress

Found an inappropriate image (Image ID: b89c712d47187a56)

Found an image containing a naked man in the object detection dataset, I'm not sure what to do about it. OIDv6 website says to notify them of inappropriate images by email at [email protected], but it seems like that email doesn't exist anymore cuz it my email bounced when I sent it to that address.

Don't know if anyone here is responsible for maintaining the dataset but here's some info on the image:

Image ID: b89c712d47187a56
Type: Object Detection
Subset: train
Bounding box label: Television
Size: 722,083 bytes

Error in "gsutil -m rsync -r xx"

hi,
when i run:
gsutil -m rsync -r gs://open-images-dataset/validation,there is a error as follow:

Unknown option: m
No command was given.
Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.

Just i has the problem? Cause it seems to must be happend. i don't know how to solver it.
please!

input aws instruction, fatal error The specified bucket does not exist

I input this at the terminal:
aws s3 --no-sign-request sync s3://open-images-dataset/train ./train
then show:
fatal error: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

How can I solve?I want to download all train images

Corrupted tar.gz dataset found?

I have downloaded the following dataset and tried using "pigz -d .tar.gz" command to uncompress these dataset to ".tar", but there comes errors: "corrupted -- invalid deflate data (invalid stored block lengths), pigz: abort: internal threads error"

Is there anything wrong with the source data, since they were downloaded many times yet yielding the same error call.

https://open-images-dataset.s3.amazonaws.com/tar/train_3.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/train_7.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/train_e.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/test.tar.gz

AccessDeniedException: 403

Hi, I tried downloading the dataset but no matter what I do, I keep getting:

AccessDeniedException: 403 **********@gmail.com does not have storage.objects.list access to open-images-dataset.

when running

gsutil -m rsync -r gs://open-images-dataset/validation .

I have tried multiple times signing up here: http://www.cvdfoundation.org/datasets/open-images-dataset/signup.html and I keep getting the success message but I still can't seem to have access to download the data.

You have successfully signed up for downloading Open Images Dataset

Am I doing anything wrong or is the website that's supposed to give me access not working? Also is the access google cloud project specific?

Download the specific category at Open Images Dataset V6

Recently, I am working on a Food recognition project and I am using your dataset to collect data. Unfortunately, I am in trouble downloading the category related to food such as meat ( chicken, beef, pork.. ) .

In the Crowdsourced Extension https://storage.googleapis.com/openimages/web/download.html, I found that raw meat ( pork meat, beef meat, chicken meat ....) are included but no JSON file provided. I have a check on the official link, https://storage.googleapis.com/openimages/web/download.html the JSON file about the hierarchy boxable class mentioned. Unfortunately, It is a lack of raw meats such as pork meat, beef meat, chicken meat....

Could you please guide me on how to get all the related food which Crowdsourced Extension mentioned? Your support is highly appreciated and definitely, it would be really helpful for me to finish my project on time.

Add 's3:GetBucketLocation' permission to AWS bucket

Can you please add the IAM Permission 's3:GetBucketLocation' in an IAM Bucket Policy to the bounding box annotated dataset hosted on AWS S3 (s3://open-images-dataset)?

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicPermissions",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::open-images-dataset"
            ]
        }
    ]
}

Given the current 'public-read' ACL present on the bucket, adding this permission doesn't expose any information that isn't already available. Specifically, the 's3:GetBucketLocation' permission only gives the bucket's 'x-amz-region', which is already available from HEAD Bucket requests and used under-the-hood by the aws cli.

Granting this permission will allow for the use of Google Cloud Storage Transfer Service to copy this dataset from AWS S3 to Google Cloud Storage (see my response in a separate open issue).

can't find any relationship between tsv files content and ImageIDs

is there any correspondence between tsv image urls and images for object detection / instance segmentation dataset?

reading open-images-dataset-validation.tsv I can see:

TsvHttpData-1.0
https://c2.staticflickr.com/6/5606/15611395595_f51465687d_o.jpg 2038323 I4V4qq54NBEFDwBqPYCkDA==
https://c6.staticflickr.com/3/2808/10351094034_f3aa58c5d9_o.jpg 1762125 38x6O2LAS75H1vUGVzIilg==
https://c2.staticflickr.com/9/8089/8416776003_9f2636ca56_o.jpg 9059623 4ksF8TuGWGcKul6Z/6pq8g==
https://farm3.staticflickr.com/568/21452126474_ab12789b36_o.jpg 2306438 R+6Cs525mCUT6RovHPWREg==

And challenge-2019-validation-segmentation-labels.csv contains:

ImageID,LabelName,Confidence
d6d443cf4233a5b4,/m/03bt1vf,0
fedfbecc33a7a5fb,/m/02gzp,0
e6fc75abc46fccc8,/m/0cmf2,1
096c39dfb17068cf,/m/04ctx,1
b038d9368f753533,/m/0283dt1,1
45cabd2ad2d70de0,/m/083wq,0
76dcb6e2df733360,/m/039xj_,0
ff7008672523e06a,/m/03bt1vf,1
b96ee006bfc22643,/m/03bt1vf,1

How could I know the image url for ImageID=d6d443cf4233a5b4?

Thanks in advance

What is the difference of data between V5 and V4?

If I already have downloaded the V4 version(images, annotations), do I need to download

  1. Images?
  2. Box annotations?
  3. Image annotations?

As far as I know, I need to download segmentation annotations.
Thank you.

Missing training Images?

Hi,

I only got 1592088 training images, one less than 1592089 according to the README. Is there something wrong? This number is also 1000+ less than what is provided in here, which is 1,593,853.

Many Thanks

Clarity on switch from Google Storage to S3

I made this comment on the commit that made the change, but posting it as an issue for visibility to the community.

The Google Storage URIs have been changed to Amazon S3 locations.

For those of us that are partway through syncing the training data from Google Storage - will it remain available?

I'm also curious in general what the reason for the switch is. Cheaper to host on S3 or are more people training models in AWS?

Speed so slow

Completed 256.0 KiB/45.9 GiB (3.2 KiB/s) with 1 file(s) remaining

It's need about 100 years to finish in current speed

access to open images

how long will it take to provide the access after submitting form in the below link. Its been at least 3 hours i am not provided access. #3

fatal error:(The read operation timed out)

Hi, I tried downloading the train dataset with AWS S3, but every time when I had downloaded almost 140K images, the downloading process would be interrupted, and the error happened: "fatal error:(The read operation timed out)", could you please help me?

Transfer from AWS Bucket to Azure Blob

Hi, I'm trying to transfer the image set from AWS bucket to Azure Blob

It's asking for the account's "Access key ID" and "Secret access key". Is it possible to release those keys from which the account is stored on AWS?

Many thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.