Git Product home page Git Product logo

chainercv's Introduction

PyPI License travis Read the Docs

ChainerCV: a Library for Deep Learning in Computer Vision

ChainerCV is a collection of tools to train and run neural networks for computer vision tasks using Chainer.

You can find the documentation here.

Supported tasks:

Guiding Principles

ChainerCV is developed under the following three guiding principles.

  • Ease of Use -- Implementations of computer vision networks with a cohesive and simple interface.
  • Reproducibility -- Training scripts that are perfect for being used as reference implementations.
  • Compositionality -- Tools such as data loaders and evaluation scripts that have common API.

Installation

$ pip install -U numpy
$ pip install chainercv

The instruction on installation using Anaconda is here (recommended).

Requirements

  • Chainer and its dependencies
  • Pillow
  • Cython (Build requirements)

For additional features

Environments under Python 2.7.12 and 3.6.0 are tested.

  • The master branch is designed to work on Chainer v6 (the stable version) and v7 (the development version).
  • The following branches are kept for the previous version of Chainer. Note that these branches are unmaintained.
    • 0.4.11 (for Chainer v1). It can be installed by pip install chainercv==0.4.11.
    • 0.7 (for Chainer v2). It can be installed by pip install chainercv==0.7.
    • 0.8 (for Chainer v3). It can be installed by pip install chainercv==0.8.
    • 0.10 (for Chainer v4). It can be installed by pip install chainercv==0.10.
    • 0.12 (for Chainer v5). It can be installed by pip install chainercv==0.12.
    • 0.13 (for Chainer v6). It can be installed by pip install chainercv==0.13.

Data Conventions

  • Image
    • The order of color channel is RGB.
    • Shape is CHW (i.e. (channel, height, width)).
    • The range of values is [0, 255].
    • Size is represented by row-column order (i.e. (height, width)).
  • Bounding Boxes
    • Shape is (R, 4).
    • Coordinates are ordered as (y_min, x_min, y_max, x_max). The order is the opposite of OpenCV.
  • Semantic Segmentation Image
    • Shape is (height, width).
    • The value is class id, which is in range [0, n_class - 1].

Sample Visualization

Example are outputs of detection models supported by ChainerCV These are the outputs of the detection models supported by ChainerCV.

Citation

If ChainerCV helps your research, please cite the paper for ACM Multimedia Open Source Software Competition. Here is a BibTeX entry:

@inproceedings{ChainerCV2017,
    author = {Niitani, Yusuke and Ogawa, Toru and Saito, Shunta and Saito, Masaki},
    title = {ChainerCV: a Library for Deep Learning in Computer Vision},
    booktitle = {ACM Multimedia},
    year = {2017},
}

The preprint can be found in arXiv: https://arxiv.org/abs/1708.08169

chainercv's People

Contributors

23pointsnorth avatar akitotakeki avatar beam2d avatar cafeal avatar crcrpar avatar disktnk avatar fukatani avatar g-votte avatar gwtnb avatar hakuyume avatar higumachan avatar iwiwi avatar keisukefukuda avatar kkebo avatar knorth55 avatar ktns avatar mannykayy avatar mitmul avatar okdshin avatar peisuke avatar rcalland avatar rezoo avatar sergeant-wizard avatar shinh avatar soskek avatar ta7uw avatar take-cheeze avatar tkerola avatar yuyu2172 avatar zori avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chainercv's Issues

where is `SemanticSegmentationEvaluator`

Hi

I tried training SegNet on chainercv==0.5.1 only to face below

Traceback (most recent call last):
  File "train.py", line 21, in <module>
    from chainercv.extensions import SemanticSegmentationEvalutor
ImportError: cannot import name SemanticSegmentationEvaluator

my env:

  • python 2.7.12
  • chainer==2.0.0
  • chainercv==0.5.1

faster rcnn how to deal with different number of ROI in each image?

I notice the FasterRCNNTrainChain class in chainer_cv, which is in faster_rcnn_train_chain.py
call method need paramter bboxes,
bboxes (~chainer.Variable): A batch of bounding boxes.
Its shape is :math:(N, R, 4).
Which R is the number of bounding boxes per image.

Does this mean faster rcnn can only deal with batch of image with the same number of foreground ROI?
And also in predict , Because there is no way to know how many region will appear in the image?

  1. So how did chainer faster rcnn deal with different number of region appear in each image?
  2. how to deal with it ** in Test phase?**

Convert dimension order from WH -> HW

size=WH --> size=HW (#234)

  • transforms/center_crop.py
  • transforms/flip.py
  • transforms/random_crop.py
  • transforms/resize.py
  • transforms/resize_contain.py
  • transforms/scale.py
  • transforms/ten_crop.py
  • links/model/ssd
  • links/model/faster_rcnn
  • utils/testing/generate_random_bbox

xy -> yx for bbox (#246)

Datasets
  • datasets/voc/voc_detection_dataset.py
evaluations
  • evaluations/eval_detection_voc.py
extensions
  • extensions/detection_vis_report.py
  • extensions/detection_voc_evaluator.py
links
  • links/models/faster_rcnn
  • links/models/ssd
transforms
  • transforms/flip_bbox.py
  • transforms/resize_bbox.py
  • transforms/translate_bbox.py
utils
  • utils/bbox/bbox_iou.py
  • utils/bbox/non_maximum_suppression.py
  • utils/testing/generate_random_bbox.py
visualization
  • visualizations/vis_bbox.py

xy->yx for Keypoints (#235)

  • transforms/keypoint/resize_keypoint.py
  • transforms/keypoint/flip_keypoint.py
  • transforms/keypoint/translate_keypoint.py
  • datasets/cub/cub_keypoint_dataset.py
  • evaluations/eval_pck.py

Add license descriptions to dataset classes

Licenses differ from dataset to dataset, and some are more restrictive than MIT license.

For each dataset, description of the license needs to be added to its doc string.

Inconsistency in (row, col) and (col, row) conventions

There is inconsistency on conventions used for arguments and return data order of transforms.
The conventions refer to (row, col) and (col, row) orders. For example, when a transform takes argument flip_x, flip_y in this order, it is following (col, row) convention.

The issue is that there are functions that follow those conventions, whereas the image shapes follow (row, col) convention.

Currently, the following codes are related to this issue.

  • flip related (x_flip, y_flip)
  • expand related (x_offset, y_offset)
  • crop related (x_slice, y_slice)
  • shapes used as argument for transforms such as resize. (shape=(H, W))

Representation for batch of bounding boxes

There are three possible representations for batch of bounding boxes.

  1. An array of shape (B, R, 4).
  2. A coordinate array and a batch index array. They are shape (R', 4) and (R',) respectively. This representation is used for chainer.functions.roi_pooling_2d.
  3. List of (R, 4) arrays

The convention on the conditions for selecting representations had not been discussed extensively yet, and I would like to discuss it here.

First of all, I would like to summarize examples found in the code.

  • For a function that takes list of images as input and returns bounding box (List[img] -> BBox), list of (R, 4) arrays is used. This is found in predict of detection links.
  • For a function that takes batch of image arrays as input and returns bounding box (BCHW -> BBox), batch of bounding box arrays (i.e. an array of shape (B, R, 4)) is used. This is found in SSD.__call__.

Here are rules that I am thinking of.

  • When number of bounding boxes per image is fixed, use (B, R, 4).
  • When number of bounding boxes per image is varying, use (R', 4) and (R',).
  • When it is output of a function with list of images as input, return list of (R, 4).

For the second case, I chose not to use list of (R, 4) because of efficiency reasons.
Since list of (R, 4) can easily be copied to (R', 4) and (R',) internally, the overhead it creates is little. However, this can create non-negligible overhead when this comes with other data type.

For example, I would like to consider a case when we want to write a function that takes batch of images and list of bounding boxes as input and returns a cropped image with same shape per bounding box (i.e. Inputs: (B, C, H, W) and list of (R, 4), Output: Length B list of (R, C, H', W')). List of batched images is hard to efficiently work with when we want to do batch image operations like batched matrix multiplications. Thus, it is better to represent them as one chunk of array with index array used separately.

EDIT:
Variable length bounding boxes shows up in Region Proposal Network in Faster RCNN.
They output NMSed bounding boxes as RoI proposals, and these bounding boxes can in principle be variable length.

https://github.com/yuyu2172/chainercv/blob/faster-rcnn-test/chainercv/links/model/faster_rcnn/region_proposal_network.py#L117

https://github.com/yuyu2172/chainercv/blob/faster-rcnn-test/chainercv/links/model/faster_rcnn/utils/proposal_creator.py

Interface of VisReport is confusing

Input and output of predict_func are confusing

Current interface

img, bbox = inputs
pred_bbox, = predict_func((img[None], bbox[None]))

The input of predict_func is output of dataset[i] with batch axis added.
In most cases, predict_func should not take the ground truth bounding box.
Therefore, the input to the predict_func should be just img[None], which is much simpler than a tuple of image and bounding box.

The output of predict_func is a length one tuple of bounding box.
Since it does not make sense to make the output tuple, making it an array and not a tuple is better for simplicity.

Batch axis of pred_bbox

Currently, all outputs of models should have batch axis. However, some users may want to output data without batch axis. Therefore, it makes sense to accept an output without batch axis.

Redesign of Wrapper to make it more flexible

Problem 1: Reduce complexity

Wrapper code is complex. This is because each dataset wrapper is a inheritance of a dataset.

--> Solution:
This problem can be circumvented by making wrappers not as datasets, but just functions.
By doing this, wrappers will be applied to original dataset without creating a new dataset object. Instead, these wrappers will change the behavior of __getitem__.

It would be more appropriate to change name of the concept Wrapper to Transformer.
This name is similar to Caffe's similar functionality. https://github.com/BVLC/caffe/blob/master/python/caffe/io.py

Problem 2: users should be able to composite transformers easily.

Ideally, transformers should just be building blocks for users to build a complex transformation pipeline.
Currently, users can only composite transformers like below.

    wrappers = [lambda d: SubtractWrapper(d),
                lambda d: PadWrapper(
                    d, max_size=(512, 512), preprocess_idx=[0, 1],
                    bg_values={0: 0, 1: -1})]
    for wrapper in wrappers:
        train_data = wrapper(train_data)

There are distinct steps that all inputs have to pass in order to create pipelines.
This makes design of transformers demanding as all wrappers need to support arbitrary inputs of arbitrary length.

I think that transformers should support different ways to construct the pipelines.
One way to do this is by compositing functions with a unify function.

TransformerComposite(dataset, [transormers])

This API can serve as a default simple way to composite transformers.

Another way to do this is by allowing user to develop the pipelines using a Python function.

def transform_get_example(_dataset, i):
    img, label_img = _dataset.get_example(i)  # this is original get_example

    img = img -122.5
    img = PadTransformer(img, options)
    label_img = PadTransformer(img, options)
    return img, label_img

dataset = extend(dataset, transform_get_example)

Inconsistency in arguments of transforms/image

Many image transform functions take some kind of argument to decide whether to return intermediate variables or not (e.g. return_params in random_expand).
These variables are necessary when handling multiple data types. For example, bounding boxes need to be transformed according to the transformation done to images.

The issue is that parameters used to control the behavior is inconsistent among transforms.
I think that the current API is confusing for users.

Examples of (function, parameter) pairs.

  • random_expand takes return_params
  • random_crop takes return_slices.
  • random_flip takes return_flips.

ImportError: cannot import name voc_detection_label_names

What are the causes of the following two errors ?

Execution result (1):
xxxxx@tegra-ubuntu:~/work/chainer-faster-rcnn/chainercv/examples/faster_rcnn/$ python demo.py ***.jpg --gpu 0

Traceback (most recent call last):
File "demo.py", line 6, in
from chainercv.datasets import voc_detection_label_names
ImportError: cannot import name voc_detection_label_names

Execution result (2):
xxxxx@tegra-ubuntu:~/work/chainer-faster-rcnn/chainercv/examples/faster_rcnn/$ python train.py --gpu 0

Traceback (most recent call last):
File "train.py", line 18, in
from chainercv.datasets import voc_detection_label_names
ImportError: cannot import name voc_detection_label_names

Execution environment:
H/W: JetsonTX1
Memory: 4GB
HDD: 64GB (Used 27GB=49%)
CPU: ARMv8 Processor rev 1 (v8l) x4
processer: aarch64
S/W: OS: Ubuntu 16.04LTS
chainer: 1.23.0
ChainerCV: 0.4.5
OpenCV: 3.1.0
python: 2.7.12 (64bit)
LANG: en_US.UTF-8
Cython: 0.25.2
matplotlib: 1.5.1
pillow: 3.1.2
Numpy: 1.12.1
CUDA: 8
cnDNN: 5

~/.bashrc
export PYTHONPATH=~/work/chainer-faster-rcnn/chainercv/:$PYTHONPATH

Lower accuracy for trained models in chainer V2

Not sure if i should post this here or in the main chainer repo.

But I noticed that when i import weights for a model in chainer V2, the performance of the exact same model with the exact same weights is lower than when imported in chainer 1.24.0

DenseNet-FC performance on camvid, evaluated using the eval_camvid script.

Chainer V1.24.0:

                Sky : 0.9197
           Building : 0.7507
               Pole : 0.3796
               Road : 0.9468
           Pavement : 0.7883
               Tree : 0.7080
         SignSymbol : 0.5379
              Fence : 0.2905
                Car : 0.8405
         Pedestrian : 0.4611
          Bicyclist : 0.3412

==================================
mean IoU : 0.6331
Class average accuracy : 0.7767
Global average accuracy : 0.8974

Chainer V2:

                Sky : 0.9168
           Building : 0.7088
               Pole : 0.3516
               Road : 0.9430
           Pavement : 0.7343
               Tree : 0.6241
         SignSymbol : 0.4656
              Fence : 0.3263
                Car : 0.8374
         Pedestrian : 0.3786
          Bicyclist : 0.3317

==================================
mean IoU : 0.6017
Class average accuracy : 0.7083
Global average accuracy : 0.8819

Is there a way to preserve the performance of models trained in earlier versions of chainer?

ChainerCV release plan

These plans are deprecated. See the actual release plan below.

After updating to version 0.6 from 0.5, ChainerCV plans to develop under two branches: one for a stable version and the other for a development version. The stable and development versions of ChainerCV support stable and development versions of Chainer respectively. The development version has alpha, beta or RC at the end.
A major version update for Chainer is released every 12 weeks, and ChainerCV's release cycle is based on that cycle. When Chainer's major version is released, the development branch of ChainerCV becomes the stable branch. The stable branch is maintained until the next major version is released.

The table below shows the planned release timeline.

Chainer v2 Chainer v3 Chainer v4 Chainer Dev branch version
0 weeks v0.6.0 v0.7.0a1 v3.0.0a1
12 weeks v0.6.x v0.7.0 v0.8.0a1 v4.0.0a1
24 weeks v0.7.y v0.8.0 v5.0.0a1

Note that we have no exact plan on the number of updates to be made between the cycle.

Return values of `transforms/image/random_*`

Transforms with non-deterministic operations (e.g. random_flip) can return intermediate variables (e.g. which direction to flip).

random_crop and random_flip return those intermediate values in different format.

https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/image/random_flip.py

https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/image/random_rotate.py

I propose to return values by tuple instead of a tuple of tuple or a tuple of dictionary because this is probably the simplest form.

More concretely for random_flip, the return values will be img, flip_h, flip_v instead of img, {'h': flip_h, 'v': flip_v}.

mystery code about IoU calculate ?

in chainercv/chainercv/utils/bbox/bbox_iou.py

the below code is very hard to understand, Could you explain to me about how exactly this code works?

    if bbox_a.shape[1] != 4 or bbox_b.shape[1] != 4:
        raise IndexError
    xp = cuda.get_array_module(bbox_a)

    # top left
    tl = xp.maximum(bbox_a[:, None, :2], bbox_b[:, :2])
    # bottom right
    br = xp.minimum(bbox_a[:, None, 2:], bbox_b[:, 2:])

    area_i = xp.prod(br - tl, axis=2) * (tl < br).all(axis=2)
    area_a = xp.prod(bbox_a[:, 2:] - bbox_a[:, :2], axis=1)
    area_b = xp.prod(bbox_b[:, 2:] - bbox_b[:, :2], axis=1)
    return area_i / (area_a[:, None] + area_b - area_i)```

Notify user about total dataset size and download location

Currently, e.g. when downloading VOC (~4GB) for examples/faster_rcnn/ there is no notice how big is the total download and where the file will be stored on disk ($HOME/.chainer/dataset/pfnet/chainercv/hoge). It might be more user-friendly to include such information to help users avoid full hdds.

pickle error.. using chainercv with chainermn

Hi,

I am trying to use chainercv with chainermn.

I used chainercv with some of my new projects and when i attempt to distribute training using chainermn, I receive the following error from the scatter_dataset method. All i am doing is applying a random_flip transform to the training data. I get the error for all my projects that use chainercv and have replicated it using the chainermn mnist example file .

image

I'm not sure as to where to raise this issue so I have raised it in both the chainercv and chainermn repos.

Change vis_* to take CHW images as input

Currently, visualization functions such as vis_bbox take an image in HWC format and uint8 dtype.
This is inconsistent with the interface of transforms.

The vis_* should be modified to take images which are

  • CHW
  • dtype == np.float32

This change will involve not only to vis_*, but also extensions supported in chainercv/extensions.

Segnet train.py no longer reports pixel_accuracy after v2 update

I updated to chainer v2 and the latest chainercv (0.5.1).

If you train the segnet model in the examples folder
and attempt to log/print the pixel_accuracy, it prints nothing.

image

Additionally, the accuracy is no longer logged.

image

For context, this is what the logger looked like before the chainer V2 update.

image

And this is what the log looked like for an epoch where the evaluator was called.

image

Redesign of transformers.extend class

Currently, transforms.extend changes the behavior of dataset.get_example while assuming that the dataset has a method get_example.
This is problematic because many datasets including Chainer's default datasets (e.g. MNIST, CIFAR) do not support get_example.
Instead, transforms.extend should change the behavior of __getitem__.

To do this, current approach of monkey patching a function does not suffice.
This is because the approach will fail to modify x[i] even though x.__getitem__(i) are modified.
This happens due to the specification of the operator [], which can be found here (https://docs.python.org/2/reference/datamodel.html#special-method-names)

In short, type(x).__getitem__(x, i) is called when x[i] and not x.__getitem__.

options

decorator class that patches transform

import chainercv


def extend(dataset_class, transform):
    class TransformedClass(dataset_class):
        def __getitem__(self, index):
            in_data = dataset_class.__getitem__(self, index)
            return transform(in_data)
    return TransformedClass

def f(in_data):
    pass

NewClass = extend(chainercv.datasets.CUBKeypointsDataset, f)
dataset = NewClass()

Make a new class TransformedDataset

class TransformedDataset(object):

    def __init__(self, dataset, transform):
        self._dataset = dataset
        self.transform = transform

    def __getitem__(self, index):
        in_data = self._dataset[index]
        return self.transform(in_data)

    def __len__(self):
        return len(self._dataset)


def transform(in_data):
     pass

dataset, _ = chainer.datasets.get_mnist()
dataset = TransformerDataset(dataset, transform)

Force users to inherit class every time he uses transform

I think that extend is a ubiquitous function that needs to be supported by a framework.

Add segmentation models

Faster R-CNN example - how to reproduce mAP score as reported in chainerCV repo?

I trained the default model on GPU and here are the results from evaluation on the VOC 2007 test (without using the 'difficult' images):

{'target/ap/aeroplane': 0.6965201021034354,
'target/ap/bicycle': 0.73484302574704907,
'target/ap/bird': 0.65840939900185358,
'target/ap/boat': 0.53384321573359594,
'target/ap/bottle': 0.48043438867464661,
'target/ap/bus': 0.7523566332916436,
'target/ap/car': 0.80289041784875204,
'target/ap/cat': 0.80800336623509772,
'target/ap/chair': 0.42661626786978352,
'target/ap/cow': 0.73043503450680392,
'target/ap/diningtable': 0.62443045057362068,
'target/ap/dog': 0.74578854807666306,
'target/ap/horse': 0.76480598347576867,
'target/ap/motorbike': 0.71962801633794005,
'target/ap/person': 0.75379224633107234,
'target/ap/pottedplant': 0.38667937510871725,
'target/ap/sheep': 0.63111339471850414,
'target/ap/sofa': 0.56472048453848944,
'target/ap/train': 0.74868343311092533,
'target/ap/tvmonitor': 0.66027285599762775,
'target/map': 0.66121333196409948}

Question is how to get the missing points from 66.1mAP to 70.5mAP as reported https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn#performance?

Format of bbox

Currently, we use a ndarray whose shape is (5,) for the format of bbox.
https://github.com/pfnet/chainercv/blob/master/chainercv/transforms/bbox/flip_bbox.py#L13

However, it has two cons.

  1. We have to use a magic index 4 when we want to access the label_id.
  2. label_id is always integer. But coordinates sometimes take float values.

A structured array seems a good solution for me.

bbox_dtype = [('rect', 'f8', 4), ('label', 'u4')]
bbox = np.array(((1.2, 3.4, 4.5, 5.6), 3), dtype=bbox_dtype)

bbox['rect']  # array([ 1.2,  3.4,  4.5,  5.6])
bbox['label']  # array(3, dtype=uint32)

Name of variables for the ground truths of semantic segmentation

Currently, the name of the ground truth of semantic segmentation is label.
This name conflicts with other usages of label.
For instance, object detection uses label whose shape is (N,).

The ground truth for semantic segmentation should have a different name.
segm is one option.
By the way, the ground truth of semantic segmentation would be defined as an array of shape (1, H, W) whose values range from [-1, L-1] with type int32, where L is the number of classes.

windows OS train faster-RCNN error!

When I train faster-RCNN on windows, It will download caffe VGG16 model and convert into .npz
file. But then it will report error: ( I haven't train faster-RCNN on Linux yet, Maybe it will report same error, I will test it on linux soon)

Traceback (most recent call last):
  File "examples/faster_rcnn/train.py", line 128, in <module>
    main()
  File "examples/faster_rcnn/train.py", line 124, in main
    trainer.run()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\trainer.py", line 296, in run
    update()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\updater.py", line 177, in update
    self.update_core()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\training\updater.py", line 181, in update_core
    batch = self._iterators['main'].next()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 73, in __next__
    self._init()  # start workers
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 142, in _init
    self._init_process()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\chainer-2.0.0-py3.5.egg\chainer\iterators\multiprocess_iterator.py", line 168, in _init_process
    worker.start()
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Program Files\Anaconda3\envs\tensorflow\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.transform'

in detection_voc_evaluator.py, Why observation is empty dict and return?

in detection_voc_evaluator.py, I can't google more evaluator example, So I just look at your faster RCNN's detection_voc_evaluator.py example? Do you know how to write a myself evalutor? why below code return an empty dict observation ?

        report = {'map': result['map']}

        if self.label_names is not None:
            for l, label_name in enumerate(self.label_names):
                try:
                    report['ap/{:s}'.format(label_name)] = result['ap'][l]
                except IndexError:
                    report['ap/{:s}'.format(label_name)] = np.nan

        observation = {}
        with reporter.report_scope(observation):
            reporter.report(report, target)
        return observation

Add tests for all functions

Some functions do not have tests.

  • tasks/detection/vis_bbox.py #136
  • utils/test.py #141
  • tasks/semantic_segmentation/eval_semantic_segmentation.py #132
  • utils/extension.py
  • utils/image.py #124
  • utils/download.py
  • dataset classes

in chainercv/evaluations/eval_detection_voc.py, how to fetch real gt_label?

I want to modify your code to meet my needs.
In chainercv/evaluations/eval_detection_voc.py, In def calc_detection_voc_prec_rec

pred_bboxes = iter(pred_bboxes)
pred_labels = iter(pred_labels)
pred_scores = iter(pred_scores)
gt_bboxes = iter(gt_bboxes)
gt_labels = iter(gt_labels)
if gt_difficults is None:
    gt_difficults = itertools.repeat(None)
else:
    gt_difficults = iter(gt_difficults)

n_pos = defaultdict(int)
score = defaultdict(list)
match = defaultdict(list)

for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \
        six.moves.zip(
            pred_bboxes, pred_labels, pred_scores,
            gt_bboxes, gt_labels, gt_difficults):
...

I use pycharm debug to inspect what real object inside gt_labels , It is a iterator or generator, I just try everything but failed to fetch real value inside it?
why is this just a iterator, How can I get real gt_labels out of it?

Add classification examples

Although, there are already classification models and examples in main Chainer, it still makes sense to add them to ChainerCV.
Here are reasons why:

  1. Chainer's classification models support BGR inputs. It can be reimplemented in ChainerCV while making it consistent with other links here.
  2. The example does not reproduce expected performance.
  3. Their are no clear explanation on how to prepare imagenet.
  4. The examples in main Chainer can not use all the features that ChainerCV supports such as transforms.

Having said that, there are at minimum three things to do.

  • Add links. Possibly, it can be located at chainercv.links.model.classifcation.
  • Prepare examples/imagenet, which contains scripts to train and evaluate various classification models.
  • We can use chainer.datasets.LabeledImageDataset for imagenet task. However, the current implementation does not force gray scale image to RGB image. This has to be fixed by sending PR to main Chainer. We can also write our own image_dataset inside ChainerCV.

Keyword consistency

  • img and imgs
    • img: (C, H, W)
    • imgs: (B, C, H, W) or list/tuple of (C, H, W)
  • bb, bbox and bboxes
    • bb: (4,)
    • bbox: (R, 4)
    • bboxes: (B, R, 4) or list/tuple of (R, 4).
  • lb, label and labels
    • lb: ()
    • label: (R,)
    • labels: (B, R) or list/tuple of (R,).
  • n_noun and n_nouns -> n_noun
    For the number of something, we use n_noun. For example, n_class and n_bbox (not n_classes and n_bboxes).
  • label and class -> mainly label
    We use label for label/class information. The usage of class (cls) is under discussion #130.
  • conf and score -> score
    We use score for an array of confidence scores.
  • pred_noun and gt_noun
    In functions which require both prediction and ground truth, we mark them by adding prefixes, pred_ and gt_. For example, pred_bboxes and gt_bboxes.
  • threshold and thresh -> thresh
    For threshold values, we use thresh. If we use more than one thresholds, we distinguish them by prefixes. For example, nms_thresh and score_thresh.
  • IoU and Jaccard index -> IoU
    In documents, we use the Intersection over Union (IoU) or Iou.
  • split and subset #131
    The name of parameter which determines the subset to use. This is under discussion.
  • For a name of a function or a class, image is used. As a name of a variable inside a function, img is used. For example, FooImage and func_image(img).
  • rois
    rois is a (R', 4) which consists of bounding boxes for multiple images. Assuming that there are B images each containing R_i bounding boxes, R' = \sum R_i. rois comes together with a (R',) array called batch_indices, which contains batch indices of images to which bounding boxes correspond to.
  • In extensions, models are named as target.

Faster-RCNN fails to train in CPU mode

python examples/faster_rcnn/train.py outputs nans.

chainercv-faster-rcnn/chainercv/chainercv/links/model/faster_rcnn/faster_rcnn_train_chain.py:154: RuntimeWarning: invalid value encountered in less
flag = (abs_diff.data < (1. / sigma2)).astype(np.float32)
miniconda/envs/ch02/lib/python3.5/site-packages/chainer/functions/activation/relu.py:48: RuntimeWarning: invalid value encountered in greater
return utils.force_array(gy[0] * (y > 0)),
chainercv-faster-rcnn/chainercv/chainercv/links/model/faster_rcnn/utils/proposal_creator.py:126: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
iteration epoch elapsed_time lr main/loss main/roi_loc_loss main/roi_cls_loss main/rpn_loc_loss main/rpn_cls_loss validation/main/map.......] 0.20%
20 0 306.354 0.001 nan nan nan nan nan

As reported offline by @yuyu2172, the CPU implementation of ROIPooling2D returns -np.inf values when ROIs are very small.

The GPU mode is confirmed to train successfully.

Function name of pad

Currently, transforms/image/pad.py has different interface to ones in NumPy.

The name of the function should be changed in order to avoid confusion.
The function implements a padding of image to match given shape.
It’s the same behavior as css background-size: contain property, so the name should be resize_contain, for example.

Bug report, in faster_rcnn.py "self.xp.clip" line

https://github.com/chainer/chainercv/blob/master/chainercv/links/model/faster_rcnn/faster_rcnn.py#L305
this line define:
cls_bbox = cls_bbox.reshape(-1, self.n_class * 4)
and then:

cls_bbox[:, slice(0, 4, 2)] = self.xp.clip(
                cls_bbox[:, slice(1, 4, 2)], 0, H / scale)
cls_bbox[:, slice(1, 4, 2)] = self.xp.clip(
                cls_bbox[:, slice(1, 4, 2)], 0, W / scale)

the slice may cause bug!
after I tested the np.clip , I realize this may be bug.
for example:

>>> a = np.arange(64).reshape(8,8)
>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])
>>> a[:, slice(0,4,2)]
array([[ 0,  2],
       [ 8, 10],
       [16, 18],
       [24, 26],
       [32, 34],
       [40, 42],
       [48, 50],
       [56, 58]])

So I think cls_bbox = cls_bbox.reshape(-1, self.n_class * 4) and after index = 4 is neglect by clip's slice

Directory structure of links

Since there are losses that can be reused between models, it is better to make a directory for losses.
This is from the discussion I had with @mitmul and @rezoo .

I will post my idea of how links should be organized.
I put functions (not links) necessary for Faster RCNN under links.model.faster_rcnn.utils. I would like a comment on this design.

.
└── links
    ├── loss
    │   ├── faster_rcnn_loss.py
    │   ├── semantic_segmentation_loss.py
    │   └── ssd_loss.py
    └── model
        ├── faster_rcnn
        │   ├── faster_rcnn.py
        │   ├── faster_rcnn_vgg.py
        │   └── utils
        │       ├── anchor_target_creator.py
        │       ├── bbox_regression_target.py
        │       ├── generate_anchor.py
        │       ├── proposal_creator.py
        │       └── proposal_target_creator.py
        ├── segnet
        │   └── segnet.py
        └── ssd
            ├── ssd.py
            └── ssd_vgg.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.