Git Product home page Git Product logo

ilgnet's Introduction

ILGnet

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

In this paper we investigate the image aesthetics classification problem, aka, automatically classifying an image into low or high aesthetic quality, which is quite a challenging problem beyond image recognition. Deep convolutional neural network (DCNN) methods have recently shown promising results for image aesthetics assessment. Currently, a powerful inception module is proposed which shows very high performance in object classification. However, the inception module has not been taken into consideration for the image aesthetics assessment problem. In this paper, we propose a novel DCNN structure codenamed ILGNet for image aesthetics classification, which introduces the Inception module and connects intermediate Local layers to the Global layer for the output. Besides, we use a pre-trained image classification CNN called GoogLeNet on the ImageNet dataset and fine tune our connected local and global layer on the large scale aesthetics assessment AVA dataset [1]. The experimental results show that the proposed ILGNet outperforms the state of the art results in image aesthetics assessment in the AVA benchmark.

The AVA dataset

For a fair comparison, we adopted same strategy to construct two sub datasets of AVA as the previous work.

[1] Naila Murray, Luca Marchesotti, Florent Perronnin. AVA: A Large-Scale Database for Aesthetic Visual Analysis. Computer Vision and Pattern Recognition (CVPR), 2012.

• AVA1: We chose the score of 5 as the boundary to divide the dataset into high quality class and low quality class. In this way, there are 74,673 images in low quality and 180,856 images in high quality. the training and test sets contain 235,529 and 20000 images.

• AVA2: to increase the gap between images with high aesthetic quality and images with low aesthetic quality, we firstly sort all images by their mean scores. Then we pick out the top 10% images as good and the bottom 10% images as bad. Thus, we select 51,106 images form the AVA dataset. And all images are evenly and randomly divided into training set and test set, which contains 25,553 images.

The way of test

please use caffe test tools to test accuracy.

The Accuracy of this random partition in the './data'

The accuracy we achieve in the AVA1 dataset is 81.68% with δ=0.And the accuracy is up to 82.66% using Inception V4.

The accuracy we achieve in the AVA2 dataset is 85.50%.And the accuracy is up to 85.53% using Inception V4.

We achieve the state of the art of the aesthetic classification accuracy.

The random partition programs are in the './src'

The Trained Models

The size of the trained model is above 500MB.

You can download them from the BaiduYun cloud disk or Google Drive:

BaiduYun Links:

ILGnet-AVA1.caffemodel

ILGnet-AVA2.caffemodel

Google Drive Links:

ILGnet-AVA1.caffemodel

ILGnet-AVA2.caffemodel

Plus:The deploy.prototxt before is wrong. Now we upload the correct file, and thanks for your suggestion.

Our paper

Xin Jin, Jingying Chi, Siwei Peng, Yulu Tian, Chaochen Ye and Xiaodong Li. Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer. The 8th International Conference on Wireless Communications and Signal Processing (WCSP), Yangzhou, China, 13-15 October, 2016 pdf(5.94MB) oral presentation(19.1MB) arXiv(1610.02256) [Project]





If you find our model/method/dataset useful, please cite our work:


@inproceedings{DBLP:conf/wcsp/JinCPTYL16,

author = {Xin Jin and Jingying Chi and Siwei Peng and Yulu Tian and Chaochen Ye andXiaodong Li},

title = {Deep image aesthetics classification using inception modules and fine-tuning connected layer},

booktitle = {8th International Conference on Wireless Communications {&} Signal Processing, {WCSP} 2016, Yangzhou, China, October 13-15, 2016},

pages = {1--6},

year = {2016},

crossref = {DBLP:conf/wcsp/2016},

url = {http://dx.doi.org/10.1109/WCSP.2016.7752571},

doi = {10.1109/WCSP.2016.7752571},

timestamp = {Fri, 16 Dec 2016 12:48:17 +0100},

biburl = {http://dblp.uni-trier.de/rec/bib/conf/wcsp/JinCPTYL16},

bibsource = {dblp computer science bibliography, http://dblp.org}

}


Latest edit

Jan 15, 2017

ilgnet's People

Contributors

bestivictory avatar gaoxing0031 avatar jinxinbh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ilgnet's Issues

How to do the prediction via pretrained model?

This is my script to load the pretrained model and make prediction. However, it is not able to recognize the apparently pretty or ugly image, so I am thinking maybe my input is not right.
There are several places that I am not quite sure:

  1. Is the input RGB or BGR?
  2. The input scale should be 0~255 rather than 0-1, right?
  3. The AVA1_mean file is (3, 256, 256), should I crop it to (227, 277, 3) and subtract that from each image?
    If possible, can anyone post a script about how to correctly read image, load model and make prediction? It is much appreciated.
import numpy as np
from PIL import Image

def preprocess_image(fp, ava1mean):
    im = Image.open(fp).convert("RGB")
    im = im.resize([227, 227])
    im = np.asarray(im).astype(np.float32) # 227, 227, 3
    if len(im.shape) != 3:
        raise Exception
    # im = im[:, :, ::-1] shall we convert RGB -> BGR?
    im -= ava1mean
    return im # 227, 227, 3


ava1mean = np.load("../ILGnet/mean/AVA1_mean.npy") # 3, 256, 256
ava1mean = ava1mean.transpose(1, 2, 0)[14:241,14:241,:] # 227, 227, 3

inputs = [preprocess_image("../ugly.jpg", ava1mean)]
classifier = caffe.Classifier("deploy2.prototxt", "ILGnet-AVA1.caffemodel",
                              image_dims=[227, 227])

print(classifier.predict(inputs, True))```

Random output value for same image

Upon running the test.py on the same image over multiple times, I get random output numbers as results. Variations are quite big ranging from 0.1 to 0.8 and 0.9. I'm using caffe 1.0.0

Unknown bottom blob 'label' (layer 'loss1/loss', bottom index 1)

新手试运行了代码:
import numpy as np
import matplotlib.pyplot as plt
caffe_root = '/opt/caffe/'
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
MODEL_FILE = caffe_root + 'ILGnet/deploy.prototxt'
PRETRAINED = caffe_root + 'ILGnet/ILGnet-AVA2.caffemodel'
IMAGE_FILE = caffe_root+'examples/images/cat.jpg'
mean_file=caffe_root + 'ILGnet/AVA2_mean.npy'
caffe.set_mode_cpu()
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
mean=np.load(mean_file).mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(227, 227))
input_image = caffe.io.load_image(IMAGE_FILE)
plt.imshow(input_image)
prediction = net.predict([input_image])
plt.plot(prediction[0])
plt.show()
print 'predicted class:', prediction[0].argmax()
不知道哪里有错,希望能解答

想请教一下AVA1的具体训练参数

您好,
您的train.prototxt是AVA2使用的,那AVA1训练使用的train.prototxt是否也完全相同呢?
我用AVA1_solver.prototxt加上train.prototxt进行训练很快会出现loss=87.3365的现象,即便将学习率调小,使用batchsize=48训练了10W个iteration之后准确率依旧只有75%左右。

why the image numbers of test set in this repository and in the paper are different ?

In the paper , image number of test set is 19930, but in this repository the number is 20000. And in readme.md , it is said that the test set in this repository is random partition, so the test accuracy is different , 81.68% in this repository and 79.25% in your paper . Could you please provide the image id of the test set in your paper ? Thank you very much.

测试了10万张图片,试验结果感觉很不理想,这是最高分图片截图:

爬虫爬了10万张高清图,精美的、中庸的以及恶劣的图片都有,在服务器上用ILGnet最新的脚本测试了一下,使用的ILGnet-AVA2.caffemodel,这是得分最高的图片:

很普通中庸的图片排在前面,和官方的例子相差甚远,而实际上,这10W张图片里面漂亮、意境唯美的图片非常多,很多大师级别的摄影图片aesthetic评分也一般,官方的代码似乎还是有点哪里不对吗?

How to deploy? : )

I am running into some issues trying to deploy the code. When I try to deploy the code, the temp_wl and loss1/classifier_wl layers are initialized randomly, so the output is random and doesn't work. As you suggested, I removed the following code:

weight_filler {
  type: "xavier"
}
bias_filler {
  type: "constant"
  value: 0.2
}

But then, all the weights and biases were 0.

Could you advise me on how to properly deploy your pre-trained model? Do I need to modify deploy.prototxt? Currently, I am using deploy.prototxt and ILGnet-AVA1.caffemodel. Your test.py did not seem to work for me.

用自己的数据集fine-tune时,预训练模型用哪一个好?

你好,谢谢你的论文以及代码,有学到很多。我是初次使用caffe,所以有些问题不太懂,想请教下:

  1. 关于数据的输入:我是不是应该先根据train.txt/val.txt + 类似create_imagenet.sh,生成lmdb文件呢?caffe可以直接输入图片吗?
  2. 如果我想训练自己的数据集,预训练模型是使用你给的ILGnet-AVA1.caffemodel,还是仅在imagenet上预训练的caffemodel呢?(如果使用仅在imagenet上预训练的caffemodel的话,去哪里下载呢?)
    期待回复!祝好!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.