Git Product home page Git Product logo

mini-imagenet-tools's Introduction

mini-ImageNet Logo

Tools for mini-ImageNet Dataset

LICENSE Python PyPI Downloads CodeFactor Grade

This repo provides python source code for creating mini-ImageNet dataset from ImageNet and the utils for generating batches during training. This repo is related to our work on few-shot learning: Meta-Transfer Learning.

Summary

About mini-ImageNet

The mini-ImageNet dataset was proposed by Vinyals et al. for few-shot learning evaluation. Its complexity is high due to the use of ImageNet images but requires fewer resources and infrastructure than running on the full ImageNet dataset. In total, there are 100 classes with 600 samples of 84ร—84 color images per class. These 100 classes are divided into 64, 16, and 20 classes respectively for sampling tasks for meta-training, meta-validation, and meta-test.

Please note that the split files in csv_files folder are created by Ravi and Larochelle (GitHub link). Vinyals et al. didn't include their split files for mini-ImageNet when they first released their paper, so Ravi and Larochelle created their own splits. Additional split files are provided here.

Requirements

  • Python 2.7 or 3.x
  • numpy
  • tqdm
  • opencv-python
  • Pillow

Installation

Install via PyPI:

pip install miniimagenettools

Install via GitHub:

git clone https://github.com/yaoyao-liu/mini-imagenet-tools.git

Usage

First, you need to download the image source files from ImageNet website. If you already have it, you may use it directly. Some people report the ImageNet website is not working. Here is an alternative download link. Please carefully read the terms for ImageNet before you download it.

Filename: ILSVRC2012_img_train.tar
Size: 138 GB
MD5: 1d675b47d978889d74fa0da5fadfb00e

Then clone the repo:

git clone https://github.com:y2l/mini-imagenet-tools.git
cd mini-imagenet-tools

To generate mini-ImageNet dataset from tar file:

python mini_imagenet_generator.py --tar_dir [your_path_of_the_ILSVRC2012_img_train.tar]

To generate mini-ImageNet dataset from untarred folder:

python mini_imagenet_generator.py --imagenet_dir [your_path_of_imagenet_folder]

If you want to resize the images to the specified resolution:

python mini_imagenet_generator.py --tar_dir [your_path_of_the_ILSVRC2012_img_train.tar] --image_resize 100

P.S. In default settings, the images will be resized to 84 ร— 84.

If you don't want to resize the images, you may set --image_resize 0.

To use the MiniImageNetDataLoader class:

from miniimagenettools.mini_imagenet_dataloader import MiniImageNetDataLoader

dataloader = MiniImageNetDataLoader(shot_num=5, way_num=5, episode_test_sample_num=15)

dataloader.generate_data_list(phase='train')
dataloader.generate_data_list(phase='val')
dataloader.generate_data_list(phase='test')

dataloader.load_list(phase='all')

for idx in range(total_train_step):
    episode_train_img, episode_train_label, episode_test_img, episode_test_label = \
        dataloader.get_batch(phase='train', idx=idx)
    ...

Performance

Check the SOTA results for mini-ImageNet on this page.

Download Processed Images

Download jpg files (Thanks for the contribution by @vainaijr)

Download tar files

Acknowledgement

Model-Agnostic Meta-Learning

Optimization as a Model for Few-Shot Learning

Meta-Learning for Semi-Supervised Few-Shot Classification

@ChristopherDaw

mini-imagenet-tools's People

Contributors

dsmic avatar vainaixr avatar yaoyao-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mini-imagenet-tools's Issues

Are images sorted in the classes?

I use the files from the "Download Processed Images" (the tar file).

To me it seems that the images become sorted using MiniImageNetDataLoader, and the first images for each label are simpler to classify than later ones?

If I use shuffle = True at the end of the middle line (there are three occurrences of such lines)

  random.shuffle(sampled_character_folders)
  labels_and_images = self.get_images(sampled_character_folders, range(self.way_num), nb_samples=self.num_samples_per_class, shuffle=False)
  labels = [li[0] for li in labels_and_images]

it is much more difficult to train the data.

I got suspicious because the default in

def get_images(self, paths, labels, nb_samples=None, shuffle=True):

which is overwritten in the call.

Reference Ravi and Larochelle in About section

The About mini-ImageNet section cites Vinyals et al. for proposing mini-ImageNet. This is true but when they published their paper, they did not include the classes which comprise mini-ImageNet until over a year later in Appendix B of a new submission on arXiv. The Ravi and Larochelle paper you cite in your Acknowledgements section created their own split since Vinyals et al. did not provide one.

The README.md on this repo cites the Ravi and Larochelle paper, but never mentions that the train, val, and test csv files come from that paper instead of the Vinyals paper which has a very different split of classes. It would be helpful to include the information of the origin of the train, val, and test splits in the About mini-ImageNet section.

mini_imagenet_dataloader.py: 189 should be self.num_samples_per_class*self.way_num

OK, I have another question or probably a bug :)

in mini_imagenet_dataloader.py:

num_samples_per_class: should be (episode_test_sample_num + shot_num) * way_num

Line 189 has
one_episode_sample_num = self.num_samples_per_class*self.shot_num

which I think I do not understand or what is wrong :) In the default way_num and shot_num are both 5, so most will not get any problems here.

Missing images when compared to the provided csv files

I downloaded the 2012 ImageNet dataset from the official source you have provided and but some images are missing when compared to your .csv files.

For example, n0209960100000035.jpg , n02099601 from your test.csv is missing in the downloaded dataset. I also tried downloading the task 3 set of data and it's not available in it either.

Can you let me know what am missing here? @yaoyao-liu

What is "episode num"?

In the MiniImageNetDataLoader class, there is a variable called episode_test_sample_num (used in the initializer) and another one (used in the generate_data_list method) called episode_num.
What's the functionality of these variables? It's clear that they are a different concept from epoch, but i don't get the meaning of this variable.

Im sorry if this could be a stupid question, but i'm new in Machine Learning and i'm just trying to learn.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.