tomrunia / tf_featureextraction Goto Github PK

Convenient wrapper for TensorFlow feature extraction from pre-trained models using tf.contrib.slim

Python 100.00%

tensorfow feature-extraction pre-trained slim

tf_featureextraction's Introduction

TensorFlow Feature Extractor

This is a convenient wrapper for feature extraction or classification in TensorFlow. Given well known pre-trained models on ImageNet, the extractor runs over a list or directory of images. Optionally, features can be saved as HDF5 file. It supports all the pre-trained models listed on the official page.

TensorFlow models tested:

Inception v1-v4
ResNet v1 and v2
VGG 16-19

Requirements

TensorFlow (tested with version 1.8)
TensorFlow Models
The usual suspects: numpy, scipy.
Optionally h5py for saving features to HDF5 file

Setup

Checkout the TensorFlow models repository somewhere on your machine. The path where you checkout the repository will be denoted <checkout_dir>/models

git clone https://github.com/tensorflow/models/

Add the directory <checkout_dir>/research/slim to the$PYTHONPATH variable. Or add a line to your .bashrc file.

export PYTHONPATH="<checkout_dir>/research/slim:$PYTHONPATH"

Download the model checkpoints from the official page.

Usage

There are two example files, one for classification and one for feature extraction.

Feature Extraction

ResNet-v1-101

example_feat_extract.py 
--network resnet_v1_101 
--checkpoint ./checkpoints/resnet_v1_101.ckpt 
--image_path ./images_dir/ 
--out_file ./features.h5
--num_classes 1000 
--layer_names resnet_v1_101/logits

ResNet-v2-101

example_feat_extract.py 
--network resnet_v2_101 
--checkpoint ./checkpoints/resnet_v2_101.ckpt 
--image_path ./images_dir/
--out_file ./features.h5 
--layer_names resnet_v2_101/logits 
--preproc_func inception

Inception-v4

example_feat_extract.py 
--network inception_v4 
--checkpoint ./checkpoints/inception_v4.ckpt 
--image_path ./images_dir/
--out_file ./features.h5 
--layer_names Logits

Image Classification

example_classification.py
--network resnet_v1_101 
--checkpoint ./checkpoints/resnet_v1_101.ckpt 
--image_path ./images_dir/
--num_classes 1000 
--logits_name resnet_v1_101/logits

Work in Progress

~~Save image file names to HDF5 file~~
~~Support for multi-threaded preprocessing~~

tf_featureextraction's People

Contributors

Stargazers

Watchers

tf_featureextraction's Issues

ValueError: could not broadcast input array

Hey,
I'm trying to run your wrapper on a batch of the Imagenet Dataset. However, I'm getting the following error:

################################################################################
Batch Size: 64
Number of Examples: 857
Number of Batches: 14
Extracting features for layer 'resnet_v2_101/logits' with shape [896, 1, 1, 1001]
################################################################################
Traceback (most recent call last):
  File "example_feat_extract.py", line 152, in <module>
    args.batch_size, args.num_classes)
  File "example_feat_extract.py", line 86, in feature_extraction_queue
    feature_dataset[layer_name][start:end] = outputs[layer_name]
ValueError: could not broadcast input array from shape (3,1,1,1001) into shape (64,1,1,1001)

I ran
python example_feat_extract.py --network resnet_v2_101 --checkpoint ../checkpoints/resnet_v2_101.ckpt --image_path ../images/curr_batch/ --out_file ../features/features.h5 --layer_names resnet_v2_101/logits --preproc_func inception

Using VGG_16/19

How can I extrcat features using vgg_16/19 ?

Why logits is the feature?

Thank you for your sharing. In your Usage introduction, Logits layer of Inception-v4 is exacted as feature using example_feat_extract.py. I wonder that why the logits is the feature be extracted?

is it possiple to use this code to extract the feature with my own trained model

I trained a model on my own dataset, is it possible to use this code to extract the feature?

Feature extract using VGG

Inception and Resnet are both fine, but when I tried to use vgg_16/19, I came up with some strange error "Invalid argument: Assign requires shapes of both tensors to match", and print_network_summary() did not work. Could you give an example for VGG? Thanks!

SystemError: initialization of _pywrap_tensorflow_internal raised unreported exception

RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9
Traceback (most recent call last):
File "./example_feat_extract.py", line 24, in
from feature_extractor.feature_extractor import FeatureExtractor
File "/home/defy/TF_FeatureExtraction/feature_extractor/feature_extractor.py", line 19, in
import tensorflow as tf
File "/usr/local/lib/python3.4/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
SystemError: initialization of _pywrap_tensorflow_internal raised unreported exception

update numpy, but do not work

thanks

For help how to use your code to do feature extraction from ReNet-v1-101

Hi,I want to use your code to do feature extraction from ReNet-v1-101.I attempted to run the file
of "example_feat_extract.py" ,but i get the following errors :
usage: example_feat_extract.py [-h] --network NETWORK_NAME --checkpoint
CHECKPOINT --image_path IMAGE_PATH
[--out_file OUT_FILE] --layer_names LAYER_NAMES
[--preproc_func PREPROC_FUNC]
[--preproc_threads NUM_PREPROC_THREADS]
[--batch_size BATCH_SIZE]
[--num_classes NUM_CLASSES]
example_feat_extract.py: error: argument --network is required

I was puzzled about what to do next ? @tomrunia

The codes of adjusted "example_feat_extract.py" are following:
`# MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy

of this software and associated documentation files (the "Software"), to deal

in the Software without restriction, including without limitation the rights

to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

copies of the Software, and to permit persons to whom the Software is

furnished to do so, subject to conditions.

Author: Tom Runia

Date Created: 2017-08-15

from future import absolute_import
from future import division
from future import print_function

import argparse
import numpy as np
import time
from datetime import datetime

from feature_extractor.feature_extractor import FeatureExtractor
import feature_extractor.utils as utils

newly added five lines

import tensorflow as tf
checkpoints_dir='/home/jsj/TF_FeatureExtraction/models/checkpoints'
if not tf.gfile.Exists(checkpoints_dir):
tf.gfile.MakeDirs(checkpoints_dir)

import imp

inception_v2=imp.load_source('inception_v2','/home/jsj/anaconda2/lib/python2.7/site-packages/tensorflow/models/research/slim/nets/inception_v2.py')

def feature_extraction_queue(feature_extractor, image_path, layer_names,
batch_size, num_classes, num_images=100000):
'''
Given a directory containing images, this function extracts features
for all images. The layers to extract features from are specified
as a list of strings. First, we seek for all images in the directory,
sort the list and feed them to the filename queue. Then, batches are
processed and features are stored in a large object features.

:param feature_extractor: object, TF feature extractor
:param image_path: str, path to directory containing images
:param layer_names: list of str, list of layer names
:param batch_size: int, batch size
:param num_classes: int, number of classes for ImageNet (1000 or 1001)
:param num_images: int, number of images to process (default=100000)
:return:
'''

# Add a list of images to process, note that the list is ordered.
image_files = utils.find_files(image_path, ("jpg", "png"))
num_images = min(len(image_files), num_images)
image_files = image_files[0:num_images]

num_examples = len(image_files)
num_batches = int(np.ceil(num_examples/batch_size))

# Fill-up last batch so it is full (otherwise queue hangs)
utils.fill_last_batch(image_files, batch_size)

print("#"*80)
print("Batch Size: {}".format(batch_size))
print("Number of Examples: {}".format(num_examples))
print("Number of Batches: {}".format(num_batches))

# Add all the images to the filename queue
feature_extractor.enqueue_image_files(image_files)

# Initialize containers for storing processed filenames and features
feature_dataset = {'filenames': []}
for i, layer_name in enumerate(layer_names):
    layer_shape = feature_extractor.layer_size(layer_name)
    layer_shape[0] = len(image_files)  # replace ? by number of examples
    feature_dataset[layer_name] = np.zeros(layer_shape, np.float32)
    print("Extracting features for layer '{}' with shape {}".format(layer_name, layer_shape))

print("#"*80)

# Perform feed-forward through the batches
for batch_index in range(num_batches):

    t1 = time.time()

    # Feed-forward one batch through the network
    outputs = feature_extractor.feed_forward_batch(layer_names)

    for layer_name in layer_names:
        start = batch_index*batch_size
        end   = start+batch_size
        feature_dataset[layer_name][start:end] = outputs[layer_name]

    # Save the filenames of the images in the batch
    feature_dataset['filenames'].extend(outputs['filenames'])

    t2 = time.time()
    examples_in_queue = outputs['examples_in_queue']
    examples_per_second = batch_size/float(t2-t1)

    print("[{}] Batch {:04d}/{:04d}, Batch Size = {}, Examples in Queue = {}, Examples/Sec = {:.2f}".format(
        datetime.now().strftime("%Y-%m-%d %H:%M"), batch_index+1,
        num_batches, batch_size, examples_in_queue, examples_per_second
    ))

# If the number of pre-processing threads >1 then the output order is
# non-deterministic. Therefore, we order the outputs again by filenames so
# the images and corresponding features are sorted in alphabetical order.
if feature_extractor.num_preproc_threads > 1:
    utils.sort_feature_dataset(feature_dataset)

# We cut-off the last part of the final batch since this was filled-up
feature_dataset['filenames'] = feature_dataset['filenames'][0:num_examples]
for layer_name in layer_names:
    feature_dataset[layer_name] = feature_dataset[layer_name][0:num_examples]

return feature_dataset

################################################################################
################################################################################
################################################################################

if name == "main":

parser = argparse.ArgumentParser(description="TensorFlow feature extraction")
parser.add_argument("--network", dest="network_name", type=str, required=True, help="model name, e.g. 'resnet_v2_101'")
parser.add_argument("--checkpoint", dest="checkpoint", type=str, required=True, help="path to pre-trained checkpoint file")
parser.add_argument("--image_path", dest="image_path", type=str, required=True, help="path to directory containing images")
parser.add_argument("--out_file", dest="out_file", type=str, default="./features.h5", help="path to save features (HDF5 file)")
parser.add_argument("--layer_names", dest="layer_names", type=str, required=True, help="layer names separated by commas")
parser.add_argument("--preproc_func", dest="preproc_func", type=str, default=None, help="force the image preprocessing function (None)")
parser.add_argument("--preproc_threads", dest="num_preproc_threads", type=int, default=2, help="number of preprocessing threads (2)")
parser.add_argument("--batch_size", dest="batch_size", type=int, default=64, help="batch size (32)")
parser.add_argument("--num_classes", dest="num_classes", type=int, default=1001, help="number of classes (1001)")
args = parser.parse_args()

# resnet_v2_101/logits,resnet_v2_101/pool4 => to list of layer names
layer_names = args.layer_names.split(",")
print("layer_names")
# Initialize the feature extractor
feature_extractor = FeatureExtractor(
    network_name=args.network_name,
    checkpoint_path=args.checkpoint,
    batch_size=args.batch_size,
    num_classes=args.num_classes,
    preproc_func_name=args.preproc_func,
    preproc_threads=args.num_preproc_threads
)

# Print the network summary, use these layer names for feature extraction
#feature_extractor.print_network_summary()

# Feature extraction example using a filename queue to feed images
feature_dataset = feature_extraction_queue(
    feature_extractor, args.image_path, layer_names,
    args.batch_size, args.num_classes)

# Write features to disk as HDF5 file
utils.write_hdf5(args.out_file, layer_names, feature_dataset)
print("Successfully written features to: {}".format(args.out_file))

# Close the threads and close session.
feature_extractor.close()
print("Finished.")

Error while running the code the second time

Dear all

I first successfully ran the code and saved the extracted features to .H5 file. But now when i am trying to run it again, i am the getting the following error

Variable resnet_v1_101/conv1/weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "c:\users\nainasaid\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1719, in init
self._traceback = tf_stack.extract_stack()
File "c:\users\nainasaid\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3158, in create_op
op_def=op_def)
File "c:\users\nainasaid\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
return func(*args, **kwargs)

Can someone please tell me how to fix this issue?

Error when running the example_feat_extract.py

Hi,
Maybe I'm missing something but after following the 3 steps + cloning the 'TF_FeatureExtraction' git itself. I run:
python example_feat_extract.py --network resnet_v1_50 --checkpoint ../resnet_v1_50.ckpt --image_path ../../studies/ECoG/video_data/frames/ --out_file ../features.h5 --num_classes 1000 --layer_names resnet_v1_50/logits

and getting this error:
Traceback (most recent call last):
File "example_feat_extract.py", line 24, in
from feature_extractor.feature_extractor import FeatureExtractor
File "/Users/berryweinstein/tensorflow/TF_FeatureExtraction/feature_extractor/feature_extractor.py", line 20, in
from nets import nets_factory
ModuleNotFoundError: No module named 'nets'

Can you please advise?

The Error When I debug the file of " example_feat_extract.py"

Hello , firstly, thanks for your shared code!
The following are Error when I debug the file of "example_feat_extract.py" :

File "/home/jsj/TF_FeatureExtraction/example_feat_extract.py", line 25, in
import feature_extractor.utils as utils
File "/home/jsj/TF_FeatureExtraction/feature_extractor/utils.py", line 23, in
from datasets import imagenet
ImportError: No module named datasets

Do I need to creat a file of dataset and then download the imagenet data? Any suggestion will be appreciated!@tomrunia

is resnet_v2_101/logits the same as fc1000?

Hi,
I am wondering is resnet_v2_101/logits the same as fc1000 in the below architecture
http://ethereon.github.io/netscope/#/gist/b21e2aae116dc1ac7b50

How to use checkpoints of MobileNet, including ckpt.meta, ckpt.index, and ckpt.data

The checkpoints of pre-trained MobileNet are as follows:

mobilenet_v1_1.0_224.ckpt.data-00000-of-00001
mobilenet_v1_1.0_224.ckpt.index
mobilenet_v1_1.0_224.ckpt.meta

rather than one single checkpoint file like:

resnet_v1_101.ckpt

I take a look at the code in feature_extractor.py. It looks that using a directory was not working.

# Find the checkpoint file
        checkpoint_path = self._checkpoint_path
        if tf.gfile.IsDirectory(self._checkpoint_path):
          checkpoint_path = tf.train.latest_checkpoint(self._checkpoint_path)

Could you pls give an advice?

100000 images limitation

Hello,

I wanted to know why there is 100000 limitation : https://github.com/tomrunia/TF_FeatureExtraction/blob/master/feature_extractor/feature_extractor.py#L72
?

Thx