This work is my undergraduation graduation project, which mainly studies few-shot object detection and then reproduces and modifies the framework based on LSTD.
- source-domain dataset: VOC PASCAL 07&12
- target-domain dataset: customized dataset, 15 samples with full annotations for each category are almost enough
- torch 1.4.0
- torchvision 0.5.0
- opencv-python 4.1.2.30
- Pillow 7.0.0
- cuda 10.1
- prepare your target-domain dataset
- specified your configuration in config.py, including target-domain path, target_num_classes and target_classes
LSTD requires transfering knowledge from source-domain to target-domain, it is necessary to train on source-domain dataset.
python train.py
where batch_size=16 is recommanded
python train_target.py
-
specify your image path
-
specify your weight path
-
python demo.py
download from baidunetdisk code:op7i
-
generative mask background suppression
It reduces the dimension of thick feature cube with statistical methods to obtain a thin feature map of the mininum, maximum, average and variance matrice stack. And then use convolutional self-encoder network to generate the mask as its background suppression regularization.
-
hot start classification training mechanism
First, finetune the RPN network on the target-domain dataset, and freeze the ROI layers and cls layers. When the training process meets certain conditions, start to train the whole framework.
On the customized dataset, mAP of the modified LSTD is 0.4 higher than that of origin LSTD.
Some results of test images are available in the result filefold.
Beside, a LOL video game is tested with default parameters, bilibili video link
- (ssd.pytorch)[https://github.com/amdegroot/ssd.pytorch]
- (LSTD)[https://arxiv.org/abs/1803.01529]