shawnau / kaggle-dsb2018 Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 724 KB

Kaggle 2018 Data Science Bowl 38th Solution

License: MIT License

Python 86.67% Makefile 0.36% Cuda 5.89% C 6.82% C++ 0.06% Shell 0.19%

kaggle-dsb2018's People

Contributors

Watchers

kaggle-dsb2018's Issues

总结一下

利用颜色正态化,会有更多的核能被检测出来,但是会有很多被误分的区域,可以通过后处理,跑到现在觉得这个数据和这个模型好像不大符,loss一直居高不下

start evaluation here! **
predicting: 0/3 (0 %) 0.01 minsave 17b9
predicting: 1/3 (33 %) 0.46 minsave 0f1f
predicting: 2/3 (67 %) 0.51 minsave 472b
predicting: 3/3 (100 %) 0.55 minsave 259b
initial_checkpoint = /home/hx/kaggle/bowl/results/3-17/checkpoint/00009500_model.pth
test_num = 4

41 41
46 5
76 30
81 5

neurcle 目前最佳

('scale_0.8', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.8, 'scale_y': 0.8}),
('scale_0.3', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.3, 'scale_y': 0.3}),
('scale_0.6', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.6, 'scale_y': 0.6}),
('scale_0.5', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.5, 'scale_y': 0.5}),

my code change nms

作用就是加了另外的条件多搞了一些box
倾斜角度选择了15度

import numpy as np
from numpy import linalg as la

# deafult nms in python for checking
def py_nms(dets, thresh):
    x0 = dets[:, 0]
    y0 = dets[:, 1]
    x1 = dets[:, 2]
    y1 = dets[:, 3]
    scores = dets[:, 4]
    areas = (x1 - x0 + 1) * (y1 - y0 + 1)
    anchor_x0_begin=(x0+x1)/2
    anchor_y0_begin=(y0+y1)/2
    anchor_distance_begin=np.sqrt(np.square(anchor_x0_begin)+np.square(anchor_y0_begin))
    #my code
    order = scores.argsort()[::-1]
    cos_threhold = 0.94
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx0 = np.maximum(x0[i], x0[order[1:]])
        yy0 = np.maximum(y0[i], y0[order[1:]])
        xx1 = np.minimum(x1[i], x1[order[1:]])
        yy1 = np.minimum(y1[i], y1[order[1:]])
        x0x0=x0[i]
        x1x1=x1[i]
        y0y0=y0[i]
        y1y1=y1[i]
        anchor_x0=anchor_x0_begin[order[1:]]
        anchor_y0=anchor_y0_begin[order[1:]]
        anchor_distance=anchor_distance_begin[order[1:]]
        w = np.maximum(0.0, xx1 - xx0 + 1)
        h = np.maximum(0.0, yy1 - yy0 + 1)
        intersect = w * h
        overlap = intersect / (areas[i] + areas[order[1:]] - intersect)
        #my select
        anchor_box_dot_right=(anchor_x0-anchor_x0_begin[i])*(x1x1-x0x0)+(anchor_y0-anchor_y0_begin[i])*(y1y1-y0y0)
        anchor_box_dot_left = (anchor_x0 - anchor_x0_begin[i]) * (x1x1 - x0x0)*(-1) + (anchor_y0 - anchor_y0_begin[i]) * (y1y1 - y0y0)
        box_index_norm=np.sqrt((x1x1-x0x0)**2+(y1y1-y0y0)**2)
        anchor_norm=np.sqrt(np.square(anchor_x0-anchor_x0_begin[i])+np.square(anchor_y0-anchor_y0_begin[i]))
        #anchor_x0_cos=np.zeros((anchor_norm.shape))
        anchor_x0_cos=anchor_box_dot_left/(box_index_norm*anchor_norm)
        anchor_judge=((anchor_x0-anchor_x0_begin[i])*(anchor_y0-anchor_y0_begin[i]))
        anchor_judge_ids=np.where(anchor_judge>=0)[0]
        if len(anchor_judge_ids)>0:
           anchor_x0_cos[anchor_judge_ids]=(anchor_box_dot_right/(box_index_norm*anchor_norm))[anchor_judge>=0]
        anchor_x0_cos=abs(anchor_x0_cos)
        myinds=np.where(((anchor_x0_cos)>=cos_threhold)&(anchor_distance*6>=box_index_norm))[0]

        #
        inds = np.where(overlap <= thresh)[0]
        inds_total=np.union1d(myinds,inds)
        #ind_total=
        order = order[inds_total + 1]
        #order=order[inds+1]
        #anchor_x0=anchor_x0[inds_total+1]
        #anchor_y0=anchor_y0[inds_total+1]
    return keep

Extra Data

H&E染色的额外数据(已标定)
https://www.kaggle.com/voglinio/external-h-e-data-with-mask-annotations

用这里的数据单独训练H&E模型并进行预测

3rd Trial (3-16) LB: 0.467

黑白图

train_dataset.split = train_ids_gray_500
valid_dataset.split = valid_ids_gray_43

Train Loss

/root/xiaoxuan/kaggle/results/3-16/checkpoint/00013200_model.pth
0.334 0.01 0.01 0.19 0.01 0.12

Valid Loss

valid_ids_gray_43
mask_average_precision = 0.69596
[email protected] = 0.87672

H&E染色图

train_dataset.split = train_color_87
valid_dataset.split = valid_color_20

Train Loss

/root/xiaoxuan/kaggle/results/3-18/checkpoint/00006000_model.pth
0.751 0.03 0.03 0.50 0.01 0.18

Valid Loss

valid_color_20

mask_average_precision = 0.46521
[email protected] = 0.76517

预测

test_black_white_53: 00013200_model.pth预测
test_purple_8: 反色+灰度: 00013200_model.pth预测
test_HE_4: 00006000_model.pth预测, 详细参数见下

LB: 0.467

记录0.046参数

  self.rpn_train_bg_thresh_high = 0.5
    self.rpn_train_fg_thresh_low  = 0.5

    self.rpn_train_nms_pre_score_threshold = 0.7
    self.rpn_train_nms_overlap_threshold   = 0.8  # higher for more proposals for mask training
    self.rpn_train_nms_min_size = 5

    self.rpn_test_nms_pre_score_threshold = 0.5#0.8
    self.rpn_test_nms_overlap_threshold   = 0.5
    self.rpn_test_nms_min_size = 5

    # rcnn ------------------------------------------------------------------
    self.rcnn_crop_size         = 14
    self.rcnn_train_batch_size  = 64  # per image
    self.rcnn_train_fg_fraction = 0.5
    self.rcnn_train_fg_thresh_low  = 0.5
    self.rcnn_train_bg_thresh_high = 0.5
    self.rcnn_train_bg_thresh_low  = 0.0

    self.rcnn_train_nms_pre_score_threshold = 0.03#0.05
    self.rcnn_train_nms_overlap_threshold   = 0.8  # high for more proposals for mask
    self.rcnn_train_nms_min_size = 5

    self.rcnn_test_nms_pre_score_threshold = 0.15#0.3
    self.rcnn_test_nms_overlap_threshold   = 0.1
    self.rcnn_test_nms_min_size = 5

    # mask ------------------------------------------------------------------
    self.mask_crop_size            = 14
    self.mask_train_batch_size     = 64  # per image
    self.mask_size                 = 28  # per image
    self.mask_train_min_size       = 5
    self.mask_train_fg_thresh_low  = self.rpn_train_fg_thresh_low

    self.mask_test_nms_pre_score_threshold = 0.1#0.4
    self.mask_test_nms_overlap_threshold = 0.05
    self.mask_test_mask_threshold  = 0.1

人眼观察,刚刚引入一点点噪声

Split信息

`train_all_664`: 训练集

train_black_white_500/valid_black_white_41: 黑底白细胞训练集/验证集
train_color_87/valid_color_10: H&E染色训练集/验证集
train_gray_12/valid_gray_4: 白底黑细胞训练集/验证集(形态和1不同), 未在测试集出现过
train_extra_270: 额外H&E染色

`test_all_65`: 测试集

test_black_white_53: 黑底白色细胞测试集
test_color_4: H&E染色测试集
test_purple_8: 黄底紫色测试集, 未在训练集出现过

记录目前8张最佳效果0.037 colornorm:00051500

参数
self.rcnn_train_nms_pre_score_threshold = 0.03#0.05
self.rcnn_train_nms_overlap_threshold = 0.8 # high for more proposals for mask
self.rcnn_train_nms_min_size = 5

    self.rcnn_test_nms_pre_score_threshold = 0.2#0.3
    self.rcnn_test_nms_overlap_threshold   = 0.5
    self.rcnn_test_nms_min_size = 5

    # mask ------------------------------------------------------------------
    self.mask_crop_size            = 14
    self.mask_train_batch_size     = 64  # per image
    self.mask_size                 = 28  # per image
    self.mask_train_min_size       = 5
    self.mask_train_fg_thresh_low  = self.rpn_train_fg_thresh_low

    self.mask_test_nms_pre_score_threshold = 0.2#0.4
    self.mask_test_nms_overlap_threshold = 0.1
    self.mask_test_mask_threshold  = 0.3

主要改了我后面有注释的参数,就是各个test_socre

ghost1

self.test_augments = [
('normal', do_test_augment_identity, undo_test_augment_identity, {}),
#('flip_transpose_1', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 1, }),
#('flip_transpose_2', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 2, }),
('flip_transpose_3', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 3, }),
#('flip_transpose_4', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 4, }),
#('flip_transpose_5', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 5, }),
('flip_transpose_6', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 6, }),
#('flip_transpose_7', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 7, }),
#('scale_0.002', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.002, 'scale_y': 0.002}),
#('scale_0.1', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.1, 'scale_y': 0.1}),
('scale_0.8', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.5, 'scale_y': 0.5}),
#('scale_0.4', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.4, 'scale_y': 0.4}),
('scale_0.6', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.6, 'scale_y': 0.6}),
#('scale_0.05', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.05, 'scale_y': 0.05}),
#('scale_1.8', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 1.8, 'scale_y': 1.8}),
]

使用Xaiver初始化BackBone各层

参考

        for key in self.state_dict():
            if key.split('.')[-1] == 'weight':
                if 'conv' in key:
                    init.kaiming_normal(self.state_dict()[key], mode='fan_out')
                if 'bn' in key:
                    self.state_dict()[key][...] = 1
            elif key.split('.')[-1] == 'bias':
                self.state_dict()[key][...] = 0

neural

背景改变

HLS_image = cv2.cvtColor(image,cv2.COLOR_BGR2HLS).astype(np.float)
yuzhi=50
l_part=HLS_image[:,:,1]+yuzhi
l_part[l_part>255]=255
HLS_image[:, :, 1]=l_part
HLS_image=cv2.convertScaleAbs(HLS_image)
#HLS_image=HLS_image.astype(np.unit8)
result=cv2.cvtColor(HLS_image,cv2.COLOR_HLS2BGR)
这个前后景没有分只是把亮度都调亮了

smallcut

感觉噪声变多了,效果变差了,参数

self.rpn_test_nms_pre_score_threshold = 0.4#0.8
self.rpn_test_nms_overlap_threshold = 0.2
self.rpn_test_nms_min_size = 5

    # rcnn ------------------------------------------------------------------
    self.rcnn_crop_size         = 14
    self.rcnn_train_batch_size  = 64  # per image
    self.rcnn_train_fg_fraction = 0.5
    self.rcnn_train_fg_thresh_low  = 0.5
    self.rcnn_train_bg_thresh_high = 0.5
    self.rcnn_train_bg_thresh_low  = 0.0

    self.rcnn_train_nms_pre_score_threshold = 0.03#0.05
    self.rcnn_train_nms_overlap_threshold   = 0.8  # high for more proposals for mask
    self.rcnn_train_nms_min_size = 5

    self.rcnn_test_nms_pre_score_threshold = 0.15#0.3
    self.rcnn_test_nms_overlap_threshold   = 0.1
    self.rcnn_test_nms_min_size = 5

    # mask ------------------------------------------------------------------
    self.mask_crop_size            = 14
    self.mask_train_batch_size     = 64  # per image
    self.mask_size                 = 28  # per image
    self.mask_train_min_size       = 5
    self.mask_train_fg_thresh_low  = self.rpn_train_fg_thresh_low

    self.mask_test_nms_pre_score_threshold = 0.1#0.4
    self.mask_test_nms_overlap_threshold = 0.04
    self.mask_test_mask_threshold  = 0.1

1st Trial (3-13) - 0.437

模型

ResNet50-fpn-maskrcnn

数据增强

训练黑白图片

train_dataset.split = train_ids_gray_500
valid_dataset.split = valid_ids_gray_43

Valid

model: 00004600_model.pth

(iter: 4600, lr=0.001)
loss: 0.377
rpn_cls: 0.02
rpn_reg: 0.02
rcnn_cls: 0.21
rcnn_reg: 0.01
mask_cls: 0.13

mask_average_precision = 0.68065
[email protected] = 0.86018

Fine-Tune 彩色图片

train_dataset.split = train_ids_color_103
valid_dataset.split = valid_ids_color_20

用00004600_model.pth 在train_ids_color_103上Fine-tune 500轮得到00005100_model.pth

Valid

0.879 0.05 0.05 0.51 0.02 0.25

mask_average_precision = 0.30041
[email protected] = 0.58047

Predict

用00004600_model.pth预测test_ids_gray_53
用00005100_model.pth预测test_ids_color_12

LB=0.437

Splits(训练/验证/预测集合分组信息)

训练集

train_ids_all_670
train_ids_color_103
train_ids_gray_500

验证集

valid_ids_color_20
valid_ids_gray_43

测试集

test_ids_all_65
test_ids_color_12
test_ids_gray_53

result of colornorm 25500

mask_average_precision = 0.42386
[email protected] = 0.73387

0.0010 25.5 k 1758.6 0.2 m | 0.770 0.03 0.03 0.51 0.01 0.18 | 0.486 0.03 0.04 0.20 0.02 0.21 | 0.607 0.02 0.03 0.34 0.02 0.19 | 8 hr 54 min
train loss 和box loss 下降显著
valid loss 基本没改变

彩色图片训练未改代码

第一次彩色图片训练
validation:
mask_average_precision = 0.45094
[email protected] = 0.77670

neural

self.test_augments = [
('normal', do_test_augment_identity, undo_test_augment_identity, {}),
('flip_transpose_1', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 1, }),
('flip_transpose_2', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 2, }),
('flip_transpose_3', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 3, }),
('flip_transpose_4', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 4, }),
('flip_transpose_5', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 5, }),
('flip_transpose_6', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 6, }),
('flip_transpose_7', do_test_augment_flip_transpose, undo_test_augment_flip_transpose, {'type': 7, }),
#('scale_0.002', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.002, 'scale_y': 0.002}),
#('scale_0.1', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.1, 'scale_y': 0.1}),
('scale_0.8', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.8, 'scale_y': 0.8}),
#('scale_0.4', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.4, 'scale_y': 0.4}),
('scale_0.6', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.6, 'scale_y': 0.6}),
('scale_0.5', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 0.5, 'scale_y': 0.5}),
#('scale_1.8', do_test_augment_scale, undo_test_augment_scale, {'scale_x': 1.8, 'scale_y': 1.8}),
]

外部数据未处理结果

22500轮,现在仍然继续训练
各项在彩色中的效果都是目前最佳效果(最初在彩色上有0.01,但是肉眼来看后期分的更加好,应该是由于误分原因)
train loss 下降明显,valid由于我用的是原本的valid.
以人眼来看误分较少,分的是核的基本都是对的,只是有些大核只分出了小核.
其中误分的是那张我们都懂的图片==
首先继续迭代训练这个,其次考虑换一下颜色标准的那个标准图片,目前总结是模型效果还不错,在他的训练分布和Test分布一致的情况下,膜拜!
..........提交的时候好像出了问题...........

2nd trial (3-15) - 0.447

模型

SE-ResNeXt50-FPN

数据增强同第一次

训练

train_dataset.split = train_ids_mix_603
valid_dataset.split = valid_ids_mix_63

iter: 4600, lr=0.01(3000)~0.001(5000)

Validation

0.530 0.03 0.02 0.32 0.01 0.16

mask_average_precision = 0.56613
[email protected] = 0.76631

Submit

LB = 0.447

shawnau / kaggle-dsb2018 Goto Github PK

kaggle-dsb2018's People

Contributors

Watchers

kaggle-dsb2018's Issues

黑白图

Train Loss

Valid Loss

H&E染色图

Train Loss

Valid Loss

预测

train_all_664: 训练集

test_all_65: 测试集

模型

数据增强

训练黑白图片

Valid

Fine-Tune 彩色图片

Valid

Predict

训练集

验证集

测试集

模型

数据增强同第一次

训练

Validation

Submit

Recommend Projects

Recommend Topics

Recommend Org

`train_all_664`: 训练集

`test_all_65`: 测试集