yaoyao-liu / mini-imagenet-tools Goto Github PK

View Code? Open in Web Editor NEW

413.0 7.0 56.0 575 KB

Tools for generating mini-ImageNet dataset and processing batches

Home Page: https://mtl.yyliu.net/datasets/

License: MIT License

Python 100.00%

miniimagenet few-shot-learning few-shot mini-imagenet imagenet dataset meta-learning one-shot-learning

mini-imagenet-tools's Introduction

Tools for mini-ImageNet Dataset

This repo provides python source code for creating mini-ImageNet dataset from ImageNet and the utils for generating batches during training. This repo is related to our work on few-shot learning: Meta-Transfer Learning.

About mini-ImageNet

The mini-ImageNet dataset was proposed by Vinyals et al. for few-shot learning evaluation. Its complexity is high due to the use of ImageNet images but requires fewer resources and infrastructure than running on the full ImageNet dataset. In total, there are 100 classes with 600 samples of 84×84 color images per class. These 100 classes are divided into 64, 16, and 20 classes respectively for sampling tasks for meta-training, meta-validation, and meta-test.

Please note that the split files in csv_files folder are created by Ravi and Larochelle (GitHub link). Vinyals et al. didn't include their split files for mini-ImageNet when they first released their paper, so Ravi and Larochelle created their own splits. Additional split files are provided here.

Requirements

Python 2.7 or 3.x
numpy
tqdm
opencv-python
Pillow

Installation

Install via PyPI:

pip install miniimagenettools

Install via GitHub:

git clone https://github.com/yaoyao-liu/mini-imagenet-tools.git

Usage

First, you need to download the image source files from ImageNet website. If you already have it, you may use it directly. Some people report the ImageNet website is not working. Here is an alternative download link. Please carefully read the terms for ImageNet before you download it.

Filename: ILSVRC2012_img_train.tar
Size: 138 GB
MD5: 1d675b47d978889d74fa0da5fadfb00e

Then clone the repo:

git clone https://github.com:y2l/mini-imagenet-tools.git
cd mini-imagenet-tools

To generate mini-ImageNet dataset from tar file:

python mini_imagenet_generator.py --tar_dir [your_path_of_the_ILSVRC2012_img_train.tar]

To generate mini-ImageNet dataset from untarred folder:

python mini_imagenet_generator.py --imagenet_dir [your_path_of_imagenet_folder]

If you want to resize the images to the specified resolution:

python mini_imagenet_generator.py --tar_dir [your_path_of_the_ILSVRC2012_img_train.tar] --image_resize 100

P.S. In default settings, the images will be resized to 84 × 84.

If you don't want to resize the images, you may set --image_resize 0.

To use the MiniImageNetDataLoader class:

from miniimagenettools.mini_imagenet_dataloader import MiniImageNetDataLoader

dataloader = MiniImageNetDataLoader(shot_num=5, way_num=5, episode_test_sample_num=15)

dataloader.generate_data_list(phase='train')
dataloader.generate_data_list(phase='val')
dataloader.generate_data_list(phase='test')

dataloader.load_list(phase='all')

for idx in range(total_train_step):
    episode_train_img, episode_train_label, episode_test_img, episode_test_label = \
        dataloader.get_batch(phase='train', idx=idx)
    ...

Performance

Check the SOTA results for mini-ImageNet on this page.

Download Processed Images

Download jpg files (Thanks for the contribution by @vainaijr)

Download tar files

Acknowledgement

Model-Agnostic Meta-Learning

Optimization as a Model for Few-Shot Learning

Meta-Learning for Semi-Supervised Few-Shot Classification

@ChristopherDaw

mini-imagenet-tools's People

Contributors

Stargazers

Watchers

mini-imagenet-tools's Issues

Are images sorted in the classes?

I use the files from the "Download Processed Images" (the tar file).

To me it seems that the images become sorted using MiniImageNetDataLoader, and the first images for each label are simpler to classify than later ones?

If I use shuffle = True at the end of the middle line (there are three occurrences of such lines)

  random.shuffle(sampled_character_folders)
  labels_and_images = self.get_images(sampled_character_folders, range(self.way_num), nb_samples=self.num_samples_per_class, shuffle=False)
  labels = [li[0] for li in labels_and_images]

it is much more difficult to train the data.

I got suspicious because the default in

def get_images(self, paths, labels, nb_samples=None, shuffle=True):

which is overwritten in the call.

How to create the csv files?

Thanks for your sharing. Just want to ask how you determine which images will be included in the csv files.

Hello, do you know how to generate the pickle file for miniImagenet?

Reference Ravi and Larochelle in About section

The About mini-ImageNet section cites Vinyals et al. for proposing mini-ImageNet. This is true but when they published their paper, they did not include the classes which comprise mini-ImageNet until over a year later in Appendix B of a new submission on arXiv. The Ravi and Larochelle paper you cite in your Acknowledgements section created their own split since Vinyals et al. did not provide one.

The README.md on this repo cites the Ravi and Larochelle paper, but never mentions that the train, val, and test csv files come from that paper instead of the Vinyals paper which has a very different split of classes. It would be helpful to include the information of the origin of the train, val, and test splits in the About mini-ImageNet section.

Google drive link for preprocessed mini-imagenet is down.

Hi,

Thank you so much for sharing the tools. The Google drive link in "Download tar files" for mini-imagenet is down. I am wondering do you have any plan to fix that? Thanks in advance.

mini_imagenet_dataloader.py: 189 should be self.num_samples_per_class*self.way_num

OK, I have another question or probably a bug :)

in mini_imagenet_dataloader.py:

num_samples_per_class: should be (episode_test_sample_num + shot_num) * way_num

Line 189 has
one_episode_sample_num = self.num_samples_per_class*self.shot_num

which I think I do not understand or what is wrong :) In the default way_num and shot_num are both 5, so most will not get any problems here.

Missing images when compared to the provided csv files

I downloaded the 2012 ImageNet dataset from the official source you have provided and but some images are missing when compared to your .csv files.

For example, n0209960100000035.jpg , n02099601 from your test.csv is missing in the downloaded dataset. I also tried downloading the task 3 set of data and it's not available in it either.

Can you let me know what am missing here? @yaoyao-liu

What is "episode num"?

In the MiniImageNetDataLoader class, there is a variable called episode_test_sample_num (used in the initializer) and another one (used in the generate_data_list method) called episode_num.
What's the functionality of these variables? It's clear that they are a different concept from epoch, but i don't get the meaning of this variable.

Im sorry if this could be a stupid question, but i'm new in Machine Learning and i'm just trying to learn.
Thanks!

directly share a google drive link?

is it possible to share a google drive link for mini imagenet, that I store in my google drive, so access it directly in google colab?