Git Product home page Git Product logo

inat_comp's Introduction

Banner

iNaturalist Competition

Please open an issue if you have questions or problems with the dataset.

2017 Competition

The 2017 competition, sponsored by Google, is part of the FGVC^4 workshop at CVPR.

Kaggle

We are using Kaggle to host the leaderboard. Checkout the competition page here.

Dates

Data Released April 5, 2017
Submission Server Open June 1, 2017
Submission Deadline July 7, 2017
Winners Announced July 21, 2017

Details

There are a total of 5,089 categories in the dataset, with 579,184 training images and 95,986 validation images. For the training set, the distribution of images per category follows the observation frequency of that category by the iNaturalist community. Therefore, there is a non-uniform distribution of images per category. Example images, along with their unique GBIF ID numbers (where available), can be viewed here.

Super Category Category Count Train Images Val Images
Plantae 2,101 158,407 38,206
Insecta 1,021 100,479 18,076
Aves 964 214,295 21,226
Reptilia 289 35,201 5,680
Mammalia 186 29,333 3,490
Fungi 121 5,826 1,780
Amphibia 115 15,318 2,385
Mollusca 93 7,536 1,841
Animalia 77 5,228 1,362
Arachnida 56 4,873 1,086
Actinopterygii 53 1,982 637
Chromista 9 398 144
Protozoa 4 308 73
Total 5,089 579,184 95,986

Evaluation

We follow a similar metric to the classification tasks of the ILSVRC. For each image , an algorithm will produce 5 labels , . We allow 5 labels because some categories are disambiguated with additional data provided by the observer, such as latitude, longitude and date. It might also be the case that multiple categories occur in an image (e.g. a photo of a bee on a flower). For this competition each image has one ground truth label , and the error for that image is:

Where

The overall error score for an algorithm is the average error over all test images:

Guidelines

Participants are restricted to train their algorithms on iNaturalist 2017 train and validation sets. Pretrained models may be used to construct the algorithms (e.g. ImageNet pretrained models) as long as participants do not actively collect additional data for the target categories of the iNaturalist 2017 competition. Please specify any and all external data used for training when uploading results.

The general rule is that we want participants to use only the provided training and validation images to train a model to classify the test images. We do not want participants crawling the web in search of additional data for the target categories. Participants should be in the mindset that this is the only data available for those categories.

Participants are allowed to collect additional annotations (e.g. bounding boxes) on the provided training and validation sets. Teams should specify that they collected additional annotations when submitting results.

Annotation Format

We closely follow the annotation format of the COCO dataset. The annotations are stored in the JSON format and are organized as follows:

{
  "info" : info,
  "images" : [image],
  "categories" : [category],
  "annotations" : [annotation],
  "licenses" : [license]
}

info{
  "year" : int,
  "version" : str,
  "description" : str,
  "contributor" : str,
  "url" : str,
  "date_created" : datetime,
}

image{
  "id" : int,
  "width" : int,
  "height" : int,
  "file_name" : str,
  "license" : int,
  "rights_holder" : str
}

category{
  "id" : int,
  "name" : str,
  "supercategory" : str,
}

annotation{
  "id" : int,
  "image_id" : int,
  "category_id" : int
}

license{
  "id" : int,
  "name" : str,
  "url" : str
}

Submission Format

The submission format for the Kaggle competition is a csv file with the following format:

id,predicted
12345,0 78 23 3 42
67890,83 13 42 0 21

The id column corresponds to the test image id. The predicted column corresponds to 5 category ids, separated by spaces. You should have one row for each test image.

Terms of Use

By downloading this dataset you agree to the following terms:

  1. You will abide by the iNaturalist Terms of Service
  2. You will use the data only for non-commercial research and educational purposes.
  3. You will NOT distribute the above images.
  4. The California Institute of Technology makes no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.
  5. You accept full responsibility for your use of the data and shall defend and indemnify the California Institute of Technology, including its employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.

Data

Download the dataset files here:

inat_comp's People

Contributors

gvanhorn38 avatar macaodha avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.