chainer / chainercv Goto Github PK

View Code? Open in Web Editor NEW

1.5K 73.0 306.0 5.74 MB

ChainerCV: a Library for Deep Learning in Computer Vision

License: MIT License

Python 98.88% Shell 0.88% Dockerfile 0.24%

chainer computer-vision neural-network deep-learning python cupy chainercv

chainercv's Introduction

ChainerCV: a Library for Deep Learning in Computer Vision

ChainerCV is a collection of tools to train and run neural networks for computer vision tasks using Chainer.

You can find the documentation here.

Supported tasks:

Image Classification (ResNet, SENet, VGG)
Object Detection (tutorial, Faster R-CNN, FPN, SSD, YOLO, Light-Head R-CNN)
Semantic Segmentation (SegNet, PSPNet, DeepLab v3+)
Instance Segmentation (FCIS, Mask R-CNN)

Guiding Principles

ChainerCV is developed under the following three guiding principles.

Ease of Use -- Implementations of computer vision networks with a cohesive and simple interface.
Reproducibility -- Training scripts that are perfect for being used as reference implementations.
Compositionality -- Tools such as data loaders and evaluation scripts that have common API.

Installation

$ pip install -U numpy
$ pip install chainercv

The instruction on installation using Anaconda is here (recommended).

Requirements

Chainer and its dependencies
Pillow
Cython (Build requirements)

For additional features

Matplotlib
OpenCV
SciPy
mpi4py
pycocotools

Environments under Python 2.7.12 and 3.6.0 are tested.

The master branch is designed to work on Chainer v6 (the stable version) and v7 (the development version).
The following branches are kept for the previous version of Chainer. Note that these branches are unmaintained.
- 0.4.11 (for Chainer v1). It can be installed by pip install chainercv==0.4.11.
- 0.7 (for Chainer v2). It can be installed by pip install chainercv==0.7.
- 0.8 (for Chainer v3). It can be installed by pip install chainercv==0.8.
- 0.10 (for Chainer v4). It can be installed by pip install chainercv==0.10.
- 0.12 (for Chainer v5). It can be installed by pip install chainercv==0.12.
- 0.13 (for Chainer v6). It can be installed by pip install chainercv==0.13.

Data Conventions

Image
- The order of color channel is RGB.
- Shape is CHW (i.e. (channel, height, width)).
- The range of values is [0, 255].
- Size is represented by row-column order (i.e. (height, width)).
Bounding Boxes
- Shape is (R, 4).
- Coordinates are ordered as (y_min, x_min, y_max, x_max). The order is the opposite of OpenCV.
Semantic Segmentation Image
- Shape is (height, width).
- The value is class id, which is in range [0, n_class - 1].

Sample Visualization

These are the outputs of the detection models supported by ChainerCV.

Citation

If ChainerCV helps your research, please cite the paper for ACM Multimedia Open Source Software Competition. Here is a BibTeX entry:

@inproceedings{ChainerCV2017,
    author = {Niitani, Yusuke and Ogawa, Toru and Saito, Shunta and Saito, Masaki},
    title = {ChainerCV: a Library for Deep Learning in Computer Vision},
    booktitle = {ACM Multimedia},
    year = {2017},
}

The preprint can be found in arXiv: https://arxiv.org/abs/1708.08169

chainercv's People

Contributors

Stargazers

Watchers

Forkers

furushchev yuyu2172 k-okada chagge lichnak hakuyume paojianghu walter1218 hvy t-abe wkentaro hin0209 manuelschmidt gopikm mannykayy zori sitifukujin zhencang knorth55 mitmul kkebo yukitsuji akitotakeki rezoo v-italy codeaudit ksharpdabu zgsxwsdxg bityangke netinmax starstylesky benjamesbabala ml-ai-nlp-ir ltoscano all3xfx ml-lab luomor 10183308 drorhilman giserh libennext zhangxgu dani-lbnl yuntang33 uchida pdfangeltop1 hesitationer gojira dekked walkoncross cvml takeratta peisuke kawataka0801 ayashin jirenjin caomw llap-v csuffyy nutszebra mabotokyo dongzhuoyao duyamin fukatani haeinkim tesagure hulalazz chuckgithub zhengfangwu mutual-ai koki0702 lecea wangjingtao vareto-forks eggplant60 mayu-ot leewaymay yasmito katotetsuro zhanghaoinf swall0w gemengshu fangwudi cbchouinard gwtnb wenh123 shawnwuzh osrisgzy sirokujira masonwang513 iory deisler134 ronekko hanshuobest oz-sit ericyao2013 afcarl avantsgit aserun janzd

chainercv's Issues

Add doc and tests to read_image_as_array

The tests should be something like the tests of ImageDataset in Chainer.

where is `SemanticSegmentationEvaluator`

I tried training SegNet on chainercv==0.5.1 only to face below

Traceback (most recent call last):
  File "train.py", line 21, in <module>
    from chainercv.extensions import SemanticSegmentationEvalutor
ImportError: cannot import name SemanticSegmentationEvaluator

my env:

python 2.7.12
chainer==2.0.0
chainercv==0.5.1

faster rcnn how to deal with different number of ROI in each image?

I notice the FasterRCNNTrainChain class in chainer_cv, which is in faster_rcnn_train_chain.py
call method need paramter bboxes,
bboxes (~chainer.Variable): A batch of bounding boxes.
Its shape is :math:(N, R, 4).
Which R is the number of bounding boxes per image.

Does this mean faster rcnn can only deal with batch of image with the same number of foreground ROI?
And also in predict , Because there is no way to know how many region will appear in the image?

So how did chainer faster rcnn deal with different number of region appear in each image？
how to deal with it ** in Test phase?**

Convert dimension order from WH -> HW

`size=WH` --> `size=HW` (#234)

xy -> yx for bbox (#246)

Datasets

datasets/voc/voc_detection_dataset.py

evaluations

evaluations/eval_detection_voc.py

extensions

extensions/detection_vis_report.py
extensions/detection_voc_evaluator.py

transforms

transforms/flip_bbox.py
transforms/resize_bbox.py
transforms/translate_bbox.py

utils

utils/bbox/bbox_iou.py
utils/bbox/non_maximum_suppression.py
utils/testing/generate_random_bbox.py

visualization

visualizations/vis_bbox.py

xy->yx for Keypoints (#235)

transforms/keypoint/resize_keypoint.py
transforms/keypoint/flip_keypoint.py
transforms/keypoint/translate_keypoint.py
datasets/cub/cub_keypoint_dataset.py
evaluations/eval_pck.py

Add license descriptions to dataset classes

Licenses differ from dataset to dataset, and some are more restrictive than MIT license.

For each dataset, description of the license needs to be added to its doc string.

[Feature Requests] Normalizer transform

Normalize an image using mean and std.
This was requested in Chainer's slack page.

Inconsistency in (row, col) and (col, row) conventions

There is inconsistency on conventions used for arguments and return data order of transforms.
The conventions refer to (row, col) and (col, row) orders. For example, when a transform takes argument flip_x, flip_y in this order, it is following (col, row) convention.

The issue is that there are functions that follow those conventions, whereas the image shapes follow (row, col) convention.

Currently, the following codes are related to this issue.

flip related (x_flip, y_flip)
expand related (x_offset, y_offset)
crop related (x_slice, y_slice)
shapes used as argument for transforms such as resize. (shape=(H, W))

Add document of APIs for detection links

#144 (comment)

Representation for batch of bounding boxes

There are three possible representations for batch of bounding boxes.

An array of shape (B, R, 4).
A coordinate array and a batch index array. They are shape (R', 4) and (R',) respectively. This representation is used for chainer.functions.roi_pooling_2d.
List of (R, 4) arrays

The convention on the conditions for selecting representations had not been discussed extensively yet, and I would like to discuss it here.

First of all, I would like to summarize examples found in the code.

For a function that takes list of images as input and returns bounding box (List[img] -> BBox), list of (R, 4) arrays is used. This is found in predict of detection links.
For a function that takes batch of image arrays as input and returns bounding box (BCHW -> BBox), batch of bounding box arrays (i.e. an array of shape (B, R, 4)) is used. This is found in SSD.__call__.

Here are rules that I am thinking of.

When number of bounding boxes per image is fixed, use (B, R, 4).
When number of bounding boxes per image is varying, use (R', 4) and (R',).
When it is output of a function with list of images as input, return list of (R, 4).

For the second case, I chose not to use list of (R, 4) because of efficiency reasons.
Since list of (R, 4) can easily be copied to (R', 4) and (R',) internally, the overhead it creates is little. However, this can create non-negligible overhead when this comes with other data type.

For example, I would like to consider a case when we want to write a function that takes batch of images and list of bounding boxes as input and returns a cropped image with same shape per bounding box (i.e. Inputs: (B, C, H, W) and list of (R, 4), Output: Length B list of (R, C, H', W')). List of batched images is hard to efficiently work with when we want to do batch image operations like batched matrix multiplications. Thus, it is better to represent them as one chunk of array with index array used separately.

EDIT:
Variable length bounding boxes shows up in Region Proposal Network in Faster RCNN.
They output NMSed bounding boxes as RoI proposals, and these bounding boxes can in principle be variable length.

https://github.com/yuyu2172/chainercv/blob/faster-rcnn-test/chainercv/links/model/faster_rcnn/region_proposal_network.py#L117

https://github.com/yuyu2172/chainercv/blob/faster-rcnn-test/chainercv/links/model/faster_rcnn/utils/proposal_creator.py

Interface of VisReport is confusing

Input and output of `predict_func` are confusing

Current interface

img, bbox = inputs
pred_bbox, = predict_func((img[None], bbox[None]))

The input of predict_func is output of dataset[i] with batch axis added.
In most cases, predict_func should not take the ground truth bounding box.
Therefore, the input to the predict_func should be just img[None], which is much simpler than a tuple of image and bounding box.

The output of predict_func is a length one tuple of bounding box.
Since it does not make sense to make the output tuple, making it an array and not a tuple is better for simplicity.

Batch axis of `pred_bbox`

Currently, all outputs of models should have batch axis. However, some users may want to output data without batch axis. Therefore, it makes sense to accept an output without batch axis.

Redesign of Wrapper to make it more flexible

Problem 1: Reduce complexity

Wrapper code is complex. This is because each dataset wrapper is a inheritance of a dataset.

--> Solution:
This problem can be circumvented by making wrappers not as datasets, but just functions.
By doing this, wrappers will be applied to original dataset without creating a new dataset object. Instead, these wrappers will change the behavior of __getitem__.

It would be more appropriate to change name of the concept Wrapper to Transformer.
This name is similar to Caffe's similar functionality. https://github.com/BVLC/caffe/blob/master/python/caffe/io.py

Problem 2: users should be able to composite transformers easily.

Ideally, transformers should just be building blocks for users to build a complex transformation pipeline.
Currently, users can only composite transformers like below.

    wrappers = [lambda d: SubtractWrapper(d),
                lambda d: PadWrapper(
                    d, max_size=(512, 512), preprocess_idx=[0, 1],
                    bg_values={0: 0, 1: -1})]
    for wrapper in wrappers:
        train_data = wrapper(train_data)

There are distinct steps that all inputs have to pass in order to create pipelines.
This makes design of transformers demanding as all wrappers need to support arbitrary inputs of arbitrary length.

I think that transformers should support different ways to construct the pipelines.
One way to do this is by compositing functions with a unify function.

TransformerComposite(dataset, [transormers])

This API can serve as a default simple way to composite transformers.

Another way to do this is by allowing user to develop the pipelines using a Python function.

def transform_get_example(_dataset, i):
    img, label_img = _dataset.get_example(i)  # this is original get_example

    img = img -122.5
    img = PadTransformer(img, options)
    label_img = PadTransformer(img, options)
    return img, label_img

dataset = extend(dataset, transform_get_example)

Argument for dataset to determine split is vague

Currently an option to determine train/val/test split of a dataset is determined by a keyword mode.
I think this is vague, and possibly be improved.

One option is to use a keyword split.

Inconsistency in arguments of transforms/image

Many image transform functions take some kind of argument to decide whether to return intermediate variables or not (e.g. return_params in random_expand).
These variables are necessary when handling multiple data types. For example, bounding boxes need to be transformed according to the transformation done to images.

The issue is that parameters used to control the behavior is inconsistent among transforms.
I think that the current API is confusing for users.

Examples of (function, parameter) pairs.

random_expand takes return_params
random_crop takes return_slices.
random_flip takes return_flips.

ImportError: cannot import name voc_detection_label_names

What are the causes of the following two errors ?

Execution result (1):
xxxxx@tegra-ubuntu:~/work/chainer-faster-rcnn/chainercv/examples/faster_rcnn/$ python demo.py ***.jpg --gpu 0

Traceback (most recent call last):
File "demo.py", line 6, in
from chainercv.datasets import voc_detection_label_names
ImportError: cannot import name voc_detection_label_names

Execution result (2):
xxxxx@tegra-ubuntu:~/work/chainer-faster-rcnn/chainercv/examples/faster_rcnn/$ python train.py --gpu 0

Traceback (most recent call last):
File "train.py", line 18, in
from chainercv.datasets import voc_detection_label_names
ImportError: cannot import name voc_detection_label_names

Execution environment:
H/W: JetsonTX1
Memory: 4GB
HDD: 64GB (Used 27GB=49%)
CPU: ARMv8 Processor rev 1 (v8l) x4
processer: aarch64
S/W: OS: Ubuntu 16.04LTS
chainer: 1.23.0
ChainerCV: 0.4.5
OpenCV: 3.1.0
python: 2.7.12 (64bit)
LANG: en_US.UTF-8
Cython: 0.25.2
matplotlib: 1.5.1
pillow: 3.1.2
Numpy: 1.12.1
CUDA: 8
cnDNN: 5

~/.bashrc
export PYTHONPATH=~/work/chainer-faster-rcnn/chainercv/:$PYTHONPATH

Interface of to_gpu for SSD is different from Link's interface

It should be to_gpu(self, device=None).

Add MSCOCO to datasets

http://mscoco.org/dataset/#download

Change image channels from BGR to RGB

What to change

utils/image.py
visualizations/vis_image.py
imagenet mean https://github.com/pfnet/chainercv/blob/master/chainercv/links/model/ssd/ssd_vgg16.py#L25 and https://github.com/pfnet/chainercv/blob/master/chainercv/links/model/faster_rcnn/faster_rcnn_vgg.py#L135
change weights uploaded on https://github.com/yuyu2172/share-weights/releases

train faster rcnn linux too slow, How to know chainer is use cudnn or not?

How to know chainer is using cudnn or not?

Lower accuracy for trained models in chainer V2

Not sure if i should post this here or in the main chainer repo.

But I noticed that when i import weights for a model in chainer V2, the performance of the exact same model with the exact same weights is lower than when imported in chainer 1.24.0

DenseNet-FC performance on camvid, evaluated using the eval_camvid script.

Chainer V1.24.0:

                Sky : 0.9197
           Building : 0.7507
               Pole : 0.3796
               Road : 0.9468
           Pavement : 0.7883
               Tree : 0.7080
         SignSymbol : 0.5379
              Fence : 0.2905
                Car : 0.8405
         Pedestrian : 0.4611
          Bicyclist : 0.3412

==================================
mean IoU : 0.6331
Class average accuracy : 0.7767
Global average accuracy : 0.8974

Chainer V2:

                Sky : 0.9168
           Building : 0.7088
               Pole : 0.3516
               Road : 0.9430
           Pavement : 0.7343
               Tree : 0.6241
         SignSymbol : 0.4656
              Fence : 0.3263
                Car : 0.8374
         Pedestrian : 0.3786
          Bicyclist : 0.3317

==================================
mean IoU : 0.6017
Class average accuracy : 0.7083
Global average accuracy : 0.8819

Is there a way to preserve the performance of models trained in earlier versions of chainer?

ChainerCV release plan

These plans are deprecated. See the actual release plan below.

After updating to version 0.6 from 0.5, ChainerCV plans to develop under two branches: one for a stable version and the other for a development version. The stable and development versions of ChainerCV support stable and development versions of Chainer respectively. The development version has alpha, beta or RC at the end.
A major version update for Chainer is released every 12 weeks, and ChainerCV's release cycle is based on that cycle. When Chainer's major version is released, the development branch of ChainerCV becomes the stable branch. The stable branch is maintained until the next major version is released.

The table below shows the planned release timeline.

Chainer v2 Chainer v3 Chainer v4 Chainer Dev branch version

0 weeks v0.6.0 v0.7.0a1 v3.0.0a1

12 weeks v0.6.x v0.7.0 v0.8.0a1 v4.0.0a1

24 weeks v0.7.y v0.8.0 v5.0.0a1

	Chainer v2	Chainer v3	Chainer v4	Chainer Dev branch version
0 weeks	v0.6.0	v0.7.0a1		v3.0.0a1
12 weeks	v0.6.x	v0.7.0	v0.8.0a1	v4.0.0a1
24 weeks		v0.7.y	v0.8.0	v5.0.0a1

Note that we have no exact plan on the number of updates to be made between the cycle.

Return values of `transforms/image/random_*`

Transforms with non-deterministic operations (e.g. random_flip) can return intermediate variables (e.g. which direction to flip).

random_crop and random_flip return those intermediate values in different format.

https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/image/random_flip.py

https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/image/random_rotate.py

I propose to return values by tuple instead of a tuple of tuple or a tuple of dictionary because this is probably the simplest form.

More concretely for random_flip, the return values will be img, flip_h, flip_v instead of img, {'h': flip_h, 'v': flip_v}.

mystery code about IoU calculate ?

in chainercv/chainercv/utils/bbox/bbox_iou.py

the below code is very hard to understand, Could you explain to me about how exactly this code works?

    if bbox_a.shape[1] != 4 or bbox_b.shape[1] != 4:
        raise IndexError
    xp = cuda.get_array_module(bbox_a)

    # top left
    tl = xp.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
    # bottom right
    br = xp.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])

    area_i = xp.prod(br - tl, axis=2) * (tl < br).all(axis=2)
    area_a = xp.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
    area_b = xp.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
    return area_i / (area_a[:, None] + area_b - area_i)```

Notify user about total dataset size and download location

Currently, e.g. when downloading VOC (~4GB) for examples/faster_rcnn/ there is no notice how big is the total download and where the file will be stored on disk ($HOME/.chainer/dataset/pfnet/chainercv/hoge). It might be more user-friendly to include such information to help users avoid full hdds.

pickle error.. using chainercv with chainermn

Hi,

I am trying to use chainercv with chainermn.

I used chainercv with some of my new projects and when i attempt to distribute training using chainermn, I receive the following error from the scatter_dataset method. All i am doing is applying a random_flip transform to the training data. I get the error for all my projects that use chainercv and have replicated it using the chainermn mnist example file .

I'm not sure as to where to raise this issue so I have raised it in both the chainercv and chainermn repos.

Change vis_* to take CHW images as input

Currently, visualization functions such as vis_bbox take an image in HWC format and uint8 dtype.
This is inconsistent with the interface of transforms.

The vis_* should be modified to take images which are

CHW
dtype == np.float32

This change will involve not only to vis_*, but also extensions supported in chainercv/extensions.

Segnet train.py no longer reports pixel_accuracy after v2 update

I updated to chainer v2 and the latest chainercv (0.5.1).

If you train the segnet model in the examples folder
and attempt to log/print the pixel_accuracy, it prints nothing.

Additionally, the accuracy is no longer logged.

For context, this is what the logger looked like before the chainer V2 update.

And this is what the log looked like for an epoch where the evaluator was called.

Redesign of transformers.extend class

Currently, transforms.extend changes the behavior of dataset.get_example while assuming that the dataset has a method get_example.
This is problematic because many datasets including Chainer's default datasets (e.g. MNIST, CIFAR) do not support get_example.
Instead, transforms.extend should change the behavior of __getitem__.

To do this, current approach of monkey patching a function does not suffice.
This is because the approach will fail to modify x[i] even though x.__getitem__(i) are modified.
This happens due to the specification of the operator [], which can be found here (https://docs.python.org/2/reference/datamodel.html#special-method-names)

In short, type(x).__getitem__(x, i) is called when x[i] and not x.__getitem__.

options

decorator class that patches transform

import chainercv


def extend(dataset_class, transform):
    class TransformedClass(dataset_class):
        def __getitem__(self, index):
            in_data = dataset_class.__getitem__(self, index)
            return transform(in_data)
    return TransformedClass

def f(in_data):
    pass

NewClass = extend(chainercv.datasets.CUBKeypointsDataset, f)
dataset = NewClass()

Make a new class `TransformedDataset`

class TransformedDataset(object):

    def __init__(self, dataset, transform):
        self._dataset = dataset
        self.transform = transform

    def __getitem__(self, index):
        in_data = self._dataset[index]
        return self.transform(in_data)

    def __len__(self):
        return len(self._dataset)


def transform(in_data):
     pass

dataset, _ = chainer.datasets.get_mnist()
dataset = TransformerDataset(dataset, transform)

Force users to inherit class every time he uses transform

I think that extend is a ubiquitous function that needs to be supported by a framework.

Add segmentation models

I'd like to add some semantic segmentation models.
Possible candidates:

Add Cityscapes to datasets

https://www.cityscapes-dataset.com/

Faster R-CNN example - how to reproduce mAP score as reported in chainerCV repo?

I trained the default model on GPU and here are the results from evaluation on the VOC 2007 test (without using the 'difficult' images):

{'target/ap/aeroplane': 0.6965201021034354,
'target/ap/bicycle': 0.73484302574704907,
'target/ap/bird': 0.65840939900185358,
'target/ap/boat': 0.53384321573359594,
'target/ap/bottle': 0.48043438867464661,
'target/ap/bus': 0.7523566332916436,
'target/ap/car': 0.80289041784875204,
'target/ap/cat': 0.80800336623509772,
'target/ap/chair': 0.42661626786978352,
'target/ap/cow': 0.73043503450680392,
'target/ap/diningtable': 0.62443045057362068,
'target/ap/dog': 0.74578854807666306,
'target/ap/horse': 0.76480598347576867,
'target/ap/motorbike': 0.71962801633794005,
'target/ap/person': 0.75379224633107234,
'target/ap/pottedplant': 0.38667937510871725,
'target/ap/sheep': 0.63111339471850414,
'target/ap/sofa': 0.56472048453848944,
'target/ap/train': 0.74868343311092533,
'target/ap/tvmonitor': 0.66027285599762775,
'target/map': 0.66121333196409948}

Question is how to get the missing points from 66.1mAP to 70.5mAP as reported https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn#performance?

Format of bbox

Currently, we use a ndarray whose shape is (5,) for the format of bbox.
https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/bbox/flip_bbox.py#L13

However, it has two cons.

We have to use a magic index 4 when we want to access the label_id.
label_id is always integer. But coordinates sometimes take float values.

A structured array seems a good solution for me.

bbox_dtype = [('rect', 'f8', 4), ('label', 'u4')]
bbox = np.array(((1.2, 3.4, 4.5, 5.6), 3), dtype=bbox_dtype)

bbox['rect']  # array([ 1.2,  3.4,  4.5,  5.6])
bbox['label']  # array(3, dtype=uint32)

Name of variables for the ground truths of semantic segmentation

Currently, the name of the ground truth of semantic segmentation is label.
This name conflicts with other usages of label.
For instance, object detection uses label whose shape is (N,).

The ground truth for semantic segmentation should have a different name.
segm is one option.
By the way, the ground truth of semantic segmentation would be defined as an array of shape (1, H, W) whose values range from [-1, L-1] with type int32, where L is the number of classes.

Format of keypoint

This issue is similar to #46.
Currently, we use a ndarray whose shape is (3,) for the format of keypoint.
https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/keypoint/resize_keypoint.py#L4

However, vaild is always bool while x and y may be float.
It is better to split into two arrays, (x, y) and valid whose shapes are (K, 2) and (K,).

windows OS train faster-RCNN error!

When I train faster-RCNN on windows, It will download caffe VGG16 model and convert into .npz
file. But then it will report error: ( I haven't train faster-RCNN on Linux yet, Maybe it will report same error, I will test it on linux soon)

Traceback (most recent call last):
  File "examples/faster_rcnn/train.py", line 128, in <module>
    main()
  File "examples/faster_rcnn/train.py", line 124, in main
    trainer.run()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\trainer.py", line 296, in run
    update()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\updater.py", line 177, in update
    self.update_core()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\updater.py", line 181, in update_core
    batch = self._iterators['main'].next()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 73, in __next__
    self._init()  # start workers
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 142, in _init
    self._init_process()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 168, in _init_process
    worker.start()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.transform'

why l - 1 in def _suppress ?

in https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/faster_rcnn.py#L237
there is label.append((l - 1) * np.ones((len(keep),)))
because self.n_class contains background class label.
but the l-1 will range= [0, class_num - 2]
so foreground label may also probaby be 0??
why l - 1 in this code?

in detection_voc_evaluator.py, Why observation is empty dict and return?

in detection_voc_evaluator.py, I can't google more evaluator example, So I just look at your faster RCNN's detection_voc_evaluator.py example? Do you know how to write a myself evalutor? why below code return an empty dict observation ?

        report = {'map': result['map']}

        if self.label_names is not None:
            for l, label_name in enumerate(self.label_names):
                try:
                    report['ap/{:s}'.format(label_name)] = result['ap'][l]
                except IndexError:
                    report['ap/{:s}'.format(label_name)] = np.nan

        observation = {}
        with reporter.report_scope(observation):
            reporter.report(report, target)
        return observation

eval_semantic_segmentation is wrong

Currently, accuracy is calculated per image.
However, in VOC evaluation, confusion matrix is calculated from all the sample in dataset, and that matrix is used to calculate accuracy.

https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/score.py#L21

Add tests for all functions

Some functions do not have tests.

in chainercv/evaluations/eval_detection_voc.py, how to fetch real gt_label?

I want to modify your code to meet my needs.
In chainercv/evaluations/eval_detection_voc.py, In def calc_detection_voc_prec_rec

pred_bboxes = iter(pred_bboxes)
pred_labels = iter(pred_labels)
pred_scores = iter(pred_scores)
gt_bboxes = iter(gt_bboxes)
gt_labels = iter(gt_labels)
if gt_difficults is None:
    gt_difficults = itertools.repeat(None)
else:
    gt_difficults = iter(gt_difficults)

n_pos = defaultdict(int)
score = defaultdict(list)
match = defaultdict(list)

for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \
        six.moves.zip(
            pred_bboxes, pred_labels, pred_scores,
            gt_bboxes, gt_labels, gt_difficults):
...

I use pycharm debug to inspect what real object inside gt_labels , It is a iterator or generator, I just try everything but failed to fetch real value inside it?
why is this just a iterator, How can I get real gt_labels out of it?

Add classification examples

Although, there are already classification models and examples in main Chainer, it still makes sense to add them to ChainerCV.
Here are reasons why:

Chainer's classification models support BGR inputs. It can be reimplemented in ChainerCV while making it consistent with other links here.
The example does not reproduce expected performance.
Their are no clear explanation on how to prepare imagenet.
The examples in main Chainer can not use all the features that ChainerCV supports such as transforms.

Having said that, there are at minimum three things to do.

Add links. Possibly, it can be located at chainercv.links.model.classifcation.
Prepare examples/imagenet, which contains scripts to train and evaluate various classification models.
We can use chainer.datasets.LabeledImageDataset for imagenet task. However, the current implementation does not force gray scale image to RGB image. This has to be fixed by sending PR to main Chainer. We can also write our own image_dataset inside ChainerCV.

Keyword consistency

[Feature Request] Compute normalizing paramters from a dataset

Compute std and mean for channels.
This was requested in Chainer slack channel.

Faster-RCNN fails to train in CPU mode

python examples/faster_rcnn/train.py outputs nans.

chainercv-faster-rcnn/chainercv/chainercv/links/model/faster_rcnn/faster_rcnn_train_chain.py:154: RuntimeWarning: invalid value encountered in less
flag = (abs_diff.data < (1. / sigma2)).astype(np.float32)
miniconda/envs/ch02/lib/python3.5/site-packages/chainer/functions/activation/relu.py:48: RuntimeWarning: invalid value encountered in greater
return utils.force_array(gy[0] * (y > 0)),
chainercv-faster-rcnn/chainercv/chainercv/links/model/faster_rcnn/utils/proposal_creator.py:126: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
iteration epoch elapsed_time lr main/loss main/roi_loc_loss main/roi_cls_loss main/rpn_loc_loss main/rpn_cls_loss validation/main/map.......] 0.20%
20 0 306.354 0.001 nan nan nan nan nan

As reported offline by @yuyu2172, the CPU implementation of ROIPooling2D returns -np.inf values when ROIs are very small.

The GPU mode is confirmed to train successfully.

Function name of pad

Currently, transforms/image/pad.py has different interface to ones in NumPy.

The name of the function should be changed in order to avoid confusion.
The function implements a padding of image to match given shape.
It’s the same behavior as css background-size: contain property, so the name should be resize_contain, for example.

Bug report, in faster_rcnn.py "self.xp.clip" line

https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/faster_rcnn.py#L305
this line define:
cls_bbox = cls_bbox.reshape(-1, self.n_class * 4)
and then:

cls_bbox[:, slice(0, 4, 2)] = self.xp.clip(
                cls_bbox[:, slice(1, 4, 2)], 0, H / scale)
cls_bbox[:, slice(1, 4, 2)] = self.xp.clip(
                cls_bbox[:, slice(1, 4, 2)], 0, W / scale)

the slice may cause bug!
after I tested the np.clip , I realize this may be bug.
for example:

>>> a = np.arange(64).reshape(8,8)
>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])
>>> a[:, slice(0,4,2)]
array([[ 0,  2],
       [ 8, 10],
       [16, 18],
       [24, 26],
       [32, 34],
       [40, 42],
       [48, 50],
       [56, 58]])

So I think cls_bbox = cls_bbox.reshape(-1, self.n_class * 4) and after index = 4 is neglect by clip's slice

Add KITTI to datasets

http://www.cvlibs.net/datasets/kitti/

Where is the faster RCNN code for training with own dataset?

I came from github/mitmul/chainer-faster-rcnn (https://github.com/mitmul/chainer-faster-rcnn). As the link below jumped to this site, I just find the top page for ChainerCV. Where is it exactly to find such code?

"Here is the complete codes for training Faster-RCNN on your data and use the pre-trained Faster-RCNN model for new data:ChainerCV"

Directory structure of links

Since there are losses that can be reused between models, it is better to make a directory for losses.
This is from the discussion I had with @mitmul and @rezoo .

I will post my idea of how links should be organized.
I put functions (not links) necessary for Faster RCNN under links.model.faster_rcnn.utils. I would like a comment on this design.

.
└── links
    ├── loss
    │   ├── faster_rcnn_loss.py
    │   ├── semantic_segmentation_loss.py
    │   └── ssd_loss.py
    └── model
        ├── faster_rcnn
        │   ├── faster_rcnn.py
        │   ├── faster_rcnn_vgg.py
        │   └── utils
        │       ├── anchor_target_creator.py
        │       ├── bbox_regression_target.py
        │       ├── generate_anchor.py
        │       ├── proposal_creator.py
        │       └── proposal_target_creator.py
        ├── segnet
        │   └── segnet.py
        └── ssd
            ├── ssd.py
            └── ssd_vgg.py

Typo in environment.yml file.

The anaconda installation http://chainercv.readthedocs.io/en/latest/#anaconda didn't work for me as is, due to a typo. chainer=2.0 should be chainer==2.0. Ditto for environment_minimum.yml

why faster RCNN share convolution feature map with different ROI can improve classify performance than RCNN?

RCNN feed individual RoI raw pixel area to entire CNN to get feature, But Faster R-CNN in this code share the same feature map produced by VGG among different RoIs, I think this may decrease classify performance because different area(RoI) may have different feature from the begining, share feature map may drop this probability.
Why this method can beyond R-CNN?

chainer / chainercv Goto Github PK

chainercv's Introduction

ChainerCV: a Library for Deep Learning in Computer Vision

Guiding Principles

Installation

Requirements

Data Conventions

Sample Visualization

Citation

chainercv's People

Contributors

Stargazers

Watchers

Forkers

chainercv's Issues

size=WH --> size=HW (#234)

xy -> yx for bbox (#246)

Datasets

evaluations

extensions

links

transforms

utils

visualization

xy->yx for Keypoints (#235)

Input and output of predict_func are confusing

Batch axis of pred_bbox

Problem 1: Reduce complexity

Problem 2: users should be able to composite transformers easily.

options

decorator class that patches transform

Make a new class TransformedDataset

Force users to inherit class every time he uses transform

Recommend Projects

Recommend Topics

Recommend Org

`size=WH` --> `size=HW` (#234)

Input and output of `predict_func` are confusing

Batch axis of `pred_bbox`

Make a new class `TransformedDataset`