cvdfoundation / open-images-dataset Goto Github PK

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

Home Page: https://github.com/openimages/dataset

open-images-dataset's Introduction

Open Images Dataset

Open Images is a dataset of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. This page aims to provide the download instructions and mirror sites for Open Images Dataset. Please visit the project page for more details on the dataset.

Download Images

Download Images With Bounding Boxes Annotations

CVDF hosts image files that have bounding boxes annotations in the Open Images Dataset V4/V5. These images contain the complete subsets of images for which instance segmentations and visual relations are annotated. The images are split into train (1,743,042), validation (41,620), and test (125,436) sets. The train set is also used in the Open Images Challenge 2018 and 2019. The images are rescaled to have at most 1024 pixels on their longest side, while preserving their original aspect-ratio. The total size is 561GB. The images can be directly downloaded into a local directory from the CVDF AWS S3 cloud storage bucket:

s3://open-images-dataset

You can either download the images to a local directory or to your own AWS S3 cloud storage bucket with the following procedures:

install awscli
download images for the train set, validation set, test set:

aws s3 --no-sign-request sync s3://open-images-dataset/train [target_dir/train] (513GB)
aws s3 --no-sign-request sync s3://open-images-dataset/validation [target_dir/validation] (12GB)
aws s3 --no-sign-request sync s3://open-images-dataset/test [target_dir/test] (36GB)

Alternatively, you can download the subsets in separate packed files (the subset train_x contains all images with ID starting with x):

aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_0.tar.gz [target_dir] (46G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_1.tar.gz [target_dir] (34G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_2.tar.gz [target_dir] (33G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_3.tar.gz [target_dir] (32G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_4.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_5.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_6.tar.gz [target_dir] (32G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_7.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_8.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_9.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_a.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_b.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_c.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_d.tar.gz [target_dir] (31G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_e.tar.gz [target_dir] (28G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_f.tar.gz [target_dir] (28G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz [target_dir] (12G)
aws s3 --no-sign-request cp s3://open-images-dataset/tar/test.tar.gz [target_dir] (36G)

The target_dir can be a local directory or a AWS S3 cloud storage bucket.

Download the Open Images Challenge 2018/2019 test set

CVDF also hosts the Open Images Challenge 2018/2019 test set, which is disjoint from the Open Images V4/V5 train, val, and test sets. The same AWS instructions above apply. Note that since the images from the 2019 challenge have not changed, the filenames only include the year 2018.

aws s3 --no-sign-request sync s3://open-images-dataset/challenge2018 [target_dir/test_challenge_2018] (10GB)

We also provide the zipped file for challenge 2018/2019 set. You can download the zipped file using

aws s3 --no-sign-request cp s3://open-images-dataset/tar/challenge2018.tar.gz [target_dir] (9.7G)

Download Full Dataset With Google Storage Transfer

Prerequisite: Google Cloud Platform account

In this section, we describe the procedures to download all images in the Open Images Dataset to a Google Cloud storage bucket. We recommend to use the user interface provided in the Google Cloud storage console for the task.

Google Storage provides a "storage transfer" function to transfer online files into a storage bucket. This function can be used to transfer images from original urls into user's storage bucket. CVDF prepares the tsv files that contain all image urls in Open Images Dataset for the transfer. The step-by-step instructions are described in Creating and Managing Transfers with the Console. The size of the whole dataset is around 18TB. Please note that user needs to pay for hosting the dataset on Google Cloud storage after downloading it. The hosting price can be found on Google Cloud Storage Pricing.

The tsv file for the validation set:
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-validation.tsv

The tsv file for the test set:
https://storage.googleapis.com/cvdf-datasets/oid/open-images-dataset-test.tsv

open-images-dataset's People

Contributors

Stargazers

Watchers

Forkers

midasc hxl1990 frankatmech hurmean jdc08161063 liangbaolin datavision-top freewing1126 ee-shawn wcm-ai murari023 shackenberg neo4reo giorking hzitoun baifanysu zgsxwsdxg sudarshan1413 johnsonman mrlukaszfuszara qiaofei32 robopsi wzhangneu yamlong sohierdane caolegebi honey-rjj uptodiff qiqika hwtwj spartag117 zhangminlin afcarl csq20081052 olgaiv39 andradeandrey yoyokitartora suzhoushr roertech aliushn willimjian simon-zys maplewzx zwcdp qabot-zh dreadlord1984 shlpu rongyan236 6008 toshihiroryuu bigdatasciencegroup jiangilhui0330 sddai ml-lab safshari luwei6896 xwyangjshb monkey006monkey wlw8991 amirunpri2018 unnaturalshah lovepug-xc jacklongking fichoo wells1400 lxxlicartu qq2737499951 swtju14 qweasdzxc110 akashlakhera trongdamnguyen samsgates ayush4188 abhay-venkatesh raijinspecial yanmenglu lijuny abdelpakey templeblock venalone miaochenguo beosro kamei310110 bbugs peace-zy asker0lee kennethchi habibzadeh shiyuan0806 nozhanb jxqj qpanvisz hninthant j420247 jensanderer schrominger luispedrogarcia aitechnology randal7 awoziji

open-images-dataset's Issues

Access denied to download

I try to download images using indstruction and I get error for access denial

AccessDeniedException: 403 #email# does not have storage.objects.list access to open-images-dataset.

Also, during login I get this warning:

WARNING: `gcloud auth login` no longer writes application default credentials.
If you need to use ADC, see:
  gcloud auth application-default --help

300k Classification Set?

Hey, is there any chance we could get the full dataset in 300k pixels as an option to download? The detection data is great to have and I'd like to get the classification data too but downloading all the thumbnails individually is taking forever and a lot of them are missing. Thanks!

Better to add timeout arguments for aws download

With the suggested command in README to download the tar files, I frequently encountered a "connection closed" error after several minutes of downloading. According to this issue, I add an extra argument, then everything goes well:

aws s3 --no-sign-request --cli-connect-timeout 6000 cp s3://open-images-dataset/tar/test.tar.gz .

It is better to add --cli-connect-timeout 6000 into README so other people will not encounter the "connection closed" error again.

Offensive Image

I found an offensive image at the db. The Image id is : c5f085aeca67f4cb. The image contains a Swastika symbol with Arabic scripts.
Please update me after the removal of this image.

download problem

("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
Could anyone solve this problem?

Download full dataset

I want to download the full dataset (18TB) but I don't get the access permission.
I've already submitted CVDF access request.
I created google storage transfer but I got a message of 'permission denied', Can anyone tell me the way to download the full dataset?
help me plz! #

Issue with cloud connectivity

"2019/03/11 13:06:26 Failed to copyto: googleapi: Error 403: [email protected] does not have storage.objects.get access to rclone/placehere., forbidden"

I keep getting the above message when trying to send a file from my Pi to my Google Drive.
This is my code.
rclone copyto /home/pi/Desktop/upload1 remote:rclone/placehere -P --progress

How can I only download some categories not all?

Hi I only want to use the images of fruit and vegetable categories, I dont need a huge full dataset, can you please give me some instructions?

Found an inappropriate image (Image ID: b89c712d47187a56)

Found an image containing a naked man in the object detection dataset, I'm not sure what to do about it. OIDv6 website says to notify them of inappropriate images by email at [email protected], but it seems like that email doesn't exist anymore cuz it my email bounced when I sent it to that address.

Don't know if anyone here is responsible for maintaining the dataset but here's some info on the image:

Image ID: b89c712d47187a56
Type: Object Detection
Subset: train
Bounding box label: Television
Size: 722,083 bytes

Error in "gsutil -m rsync -r xx"

hi,
when i run:
gsutil -m rsync -r gs://open-images-dataset/validation,there is a error as follow:

Unknown option: m
No command was given.
Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.

Just i has the problem? Cause it seems to must be happend. i don't know how to solver it.
please!

input aws instruction, fatal error The specified bucket does not exist

I input this at the terminal:
aws s3 --no-sign-request sync s3://open-images-dataset/train ./train
then show:
fatal error: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

How can I solve？I want to download all train images

Only want to download photos, not labels, metadata folders

What if I use Python to download a random number of photos from the training set, but I only want the data folder, so I only want photos, not the labels, metadata folder。Thank you for your attention.

Corrupted tar.gz dataset found?

I have downloaded the following dataset and tried using "pigz -d .tar.gz" command to uncompress these dataset to ".tar", but there comes errors: "corrupted -- invalid deflate data (invalid stored block lengths), pigz: abort: internal threads error"

Is there anything wrong with the source data, since they were downloaded many times yet yielding the same error call.

https://open-images-dataset.s3.amazonaws.com/tar/train_3.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/train_7.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/train_e.tar.gz
https://open-images-dataset.s3.amazonaws.com/tar/test.tar.gz

Have the bounding boxes annotations in VOC PASCAL format. Care to host?

@shackenberg @sergebelongie @tylin @jrruijli

Hey guys. I've generated all the annotations in all available XML VOC PASCAL format for the train (1,743,042), validation (41,620), and test (125,436) sets.

Looking for a place to getting these files hosted. Care to host?

AccessDeniedException: 403

Hi, I tried downloading the dataset but no matter what I do, I keep getting:

AccessDeniedException: 403 **********@gmail.com does not have storage.objects.list access to open-images-dataset.

when running

gsutil -m rsync -r gs://open-images-dataset/validation .

I have tried multiple times signing up here: http://www.cvdfoundation.org/datasets/open-images-dataset/signup.html and I keep getting the success message but I still can't seem to have access to download the data.

You have successfully signed up for downloading Open Images Dataset

Am I doing anything wrong or is the website that's supposed to give me access not working? Also is the access google cloud project specific?

Download the specific category at Open Images Dataset V6

Recently, I am working on a Food recognition project and I am using your dataset to collect data. Unfortunately, I am in trouble downloading the category related to food such as meat ( chicken, beef, pork.. ) .

In the Crowdsourced Extension https://storage.googleapis.com/openimages/web/download.html, I found that raw meat ( pork meat, beef meat, chicken meat ....) are included but no JSON file provided. I have a check on the official link, https://storage.googleapis.com/openimages/web/download.html the JSON file about the hierarchy boxable class mentioned. Unfortunately, It is a lack of raw meats such as pork meat, beef meat, chicken meat....

Could you please guide me on how to get all the related food which Crowdsourced Extension mentioned? Your support is highly appreciated and definitely, it would be really helpful for me to finish my project on time.

Add 's3:GetBucketLocation' permission to AWS bucket

Can you please add the IAM Permission 's3:GetBucketLocation' in an IAM Bucket Policy to the bounding box annotated dataset hosted on AWS S3 (s3://open-images-dataset)?

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicPermissions",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::open-images-dataset"
            ]
        }
    ]
}

Given the current 'public-read' ACL present on the bucket, adding this permission doesn't expose any information that isn't already available. Specifically, the 's3:GetBucketLocation' permission only gives the bucket's 'x-amz-region', which is already available from HEAD Bucket requests and used under-the-hood by the aws cli.

Granting this permission will allow for the use of Google Cloud Storage Transfer Service to copy this dataset from AWS S3 to Google Cloud Storage (see my response in a separate open issue).

can't find any relationship between tsv files content and ImageIDs

is there any correspondence between tsv image urls and images for object detection / instance segmentation dataset?

reading open-images-dataset-validation.tsv I can see:

TsvHttpData-1.0
https://c2.staticflickr.com/6/5606/15611395595_f51465687d_o.jpg 2038323 I4V4qq54NBEFDwBqPYCkDA==
https://c6.staticflickr.com/3/2808/10351094034_f3aa58c5d9_o.jpg 1762125 38x6O2LAS75H1vUGVzIilg==
https://c2.staticflickr.com/9/8089/8416776003_9f2636ca56_o.jpg 9059623 4ksF8TuGWGcKul6Z/6pq8g==
https://farm3.staticflickr.com/568/21452126474_ab12789b36_o.jpg 2306438 R+6Cs525mCUT6RovHPWREg==

And challenge-2019-validation-segmentation-labels.csv contains:

ImageID,LabelName,Confidence
d6d443cf4233a5b4,/m/03bt1vf,0
fedfbecc33a7a5fb,/m/02gzp,0
e6fc75abc46fccc8,/m/0cmf2,1
096c39dfb17068cf,/m/04ctx,1
b038d9368f753533,/m/0283dt1,1
45cabd2ad2d70de0,/m/083wq,0
76dcb6e2df733360,/m/039xj_,0
ff7008672523e06a,/m/03bt1vf,1
b96ee006bfc22643,/m/03bt1vf,1

How could I know the image url for ImageID=d6d443cf4233a5b4?

Thanks in advance

What is the difference of data between V5 and V4?

If I already have downloaded the V4 version(images, annotations), do I need to download

Images?
Box annotations?
Image annotations?

As far as I know, I need to download segmentation annotations.
Thank you.

Typo : 2818 > 2018

2018/2019 incorrectly written as 2818/2019

endpoint not able to connect

I am trying to download the gz files but got this error. Am I missing anything?

aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_0.tar.gz .
download failed: s3://open-images-dataset/tar/train_0.tar.gz to .\train_0.tar.gz Could not connect to the endpoint URL: "https://open-images-dataset.s3.amazonaws.com/tar/train_0.tar.gz"

Transfering from AWS Bucket to Google Cloud Bucket

Hi, I'm trying to transfer the image set from AWS bucket to GC bucket through https://console.cloud.google.com/storage/transfer

It's asking for the account's "Access key ID" and "Secret access key". Is it possible to release those keys from which the account is stored on AWS?

I've tried the "gsutil cp" command, but keep getting errors. Any tips?

Error occurs, please double check you enter the correct email address.

i fill the Access Requst Form but get the message:

Error occurs, please double check you enter the correct email address.

my form is as follow:

name: zyx
organization: hisign
gmail: [email protected]
usage: detection

Missing training Images?

Hi,

I only got 1592088 training images, one less than 1592089 according to the README. Is there something wrong? This number is also 1000+ less than what is provided in here, which is 1,593,853.

Many Thanks

Clarity on switch from Google Storage to S3

I made this comment on the commit that made the change, but posting it as an issue for visibility to the community.

The Google Storage URIs have been changed to Amazon S3 locations.

For those of us that are partway through syncing the training data from Google Storage - will it remain available?

I'm also curious in general what the reason for the switch is. Cheaper to host on S3 or are more people training models in AWS?

Speed so slow

Completed 256.0 KiB/45.9 GiB (3.2 KiB/s) with 1 file(s) remaining

It's need about 100 years to finish in current speed

Could you please upload a copy to Baidu Pan?

I`m suffering aws and google for it cannot resume downloading. Could you please upload the train set and test set to Baidu Pan? Thank you very much!

access to open images

how long will it take to provide the access after submitting form in the below link. Its been at least 3 hours i am not provided access. #3

psutil.NoSuchProcess: process no longer exists (pid=13912) ,what can I do

fatal error:(The read operation timed out)

Hi, I tried downloading the train dataset with AWS S3, but every time when I had downloaded almost 140K images, the downloading process would be interrupted, and the error happened: "fatal error:(The read operation timed out)", could you please help me?

Is the link outdated?

Hi, I want to download open-images dataset v6.

I clicked Images/Download from Figure Eight on

https://storage.googleapis.com/openimages/web/download.html.

It says, Openimages datasets v5 (bounding boxes)...

https://www.figure-eight.com/dataset/open-images-annotated-with-bounding-boxes/

So i wonder if it's really v5 so I have to avoid it if i want to download v6

Thanks for maintaining a huge dataset

Image Download Signup Form Problem

Hi! I am having the following issue when trying to access the image download sign up form (http://www.cvdfoundation.org/datasets/open-images-dataset/signup.html):

I don't believe this is an issue with my account because even when I open this link in Incognito Mode, I get the same error. How can I resolve this issue? Thanks!

Transfer from AWS Bucket to Azure Blob

Hi, I'm trying to transfer the image set from AWS bucket to Azure Blob

It's asking for the account's "Access key ID" and "Secret access key". Is it possible to release those keys from which the account is stored on AWS?

Many thanks.

Error occurs, please double check you enter the correct email address.

In this cvdf url:http://www.cvdfoundation.org/datasets/open-images-dataset/signup.html
The web return this error when I input my google email address:
Error occurs, please double check you enter the correct email address.
Im sure the address is correct, what can i do for this error?thanks!

cvdfoundation / open-images-dataset Goto Github PK

open-images-dataset's Introduction

Open Images Dataset

Download Images

Download Images With Bounding Boxes Annotations

Download the Open Images Challenge 2018/2019 test set

Download Full Dataset With Google Storage Transfer

open-images-dataset's People

Contributors

Stargazers

Watchers

Forkers

open-images-dataset's Issues

Recommend Projects

Recommend Topics

Recommend Org