Git Product home page Git Product logo

visiondk's Introduction

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

Install โ˜˜๏ธ
# It is recommanded to create a separate virtual environment
conda create -n vision python=3.10 
conda activate vision

# torch==2.0.1(lower is also ok) -> https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cpuonly -c pytorch # cpu-version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia  # cuda-version

pip install -r requirements.txt

# Without Arial.ttf, inference may be slow due to network IO.
mkdir -p ~/.config/DuKe
cp misc/Arial.ttf ~/.config/DuKe
Training ๐ŸŒŸ๏ธ
# one machine one gpu
python main.py --cfgs configs/task/pet.yaml

# one machine multiple gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 main.py --cfgs configs/classification/pet.yaml
                                                                 --sync_bn[Option: this will lead to training slowly]
                                                                 --resume[Option: training from checkpoint]
                                                                 --load_from[Option: training from fine-tuning]

What's New

  • [Apr. 2024] Face Recognition Task(FRT) is supported now ๐Ÿš€๏ธ๏ธ! We provide ResNet, EfficientNet, and Swin Transformer as backbone; As for head, ArcFace, CircleLoss, MegFace and MV Softmax could be used for training. Note: partial implementation refers to JD-FaceX
  • [Jun. 2023] Image Classification Task(ICT) has launched ๐Ÿš€๏ธ๏ธ! Supporting many powerful strategies, such as progressive learning, online enhancement, beautiful training interface, exponential moving average, etc. The models are fully integrated into torchvision.
  • [May. 2023] The first initialization version of Vision.

Which's task

  1. Face Recognition Task(FRT)
  2. Image Classification Task(ICT)

Implemented Method & Paper

Method Paper
SAM Sharpness-Aware Minimization for Efficiently Improving Generalization
Progressive Learning EfficientNetV2: Smaller Models and Faster Training
OHEM Training Region-based Object Detectors with Online Hard Example Mining
Focal Loss Focal Loss for Dense Object Detection
Cosine Annealing SGDR: Stochastic Gradient Descent with Warm Restarts
Label Smoothing Rethinking the Inception Architecture for Computer Vision
Mixup MixUp: Beyond Empirical Risk Minimization
CutOut Improved Regularization of Convolutional Neural Networks with Cutout
Attention Pool Augmenting Convolutional networks with attention-based aggregation
GradCAM Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ArcFace ArcFace: Additive Angular Margin Loss for Deep Face Recognition
CircleLoss Circle Loss: A Unified Perspective of Pair Similarity Optimization
MegFace MagFace: A Universal Representation for Face Recognition and Quality Assessment
MV Softmax Mis-classified Vector Guided Softmax Loss for Face Recognition

Model & Paper

Method Paper Name in configs, eg: torchvision-mobilenet_v2
MobileNetv2 MobileNetV2: Inverted Residuals and Linear Bottlenecks mobilenet_v2
MobileNetv3 Searching for MobileNetV3 mobilenet_v3_small, mobilenet_v3_large
ShuffleNetv2 ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
ResNet Deep Residual Learning for Image Recognition resnet18, resnet34, resnet50, resnet101, resnet152
ResNeXt Aggregated Residual Transformations for Deep Neural Networks resnext50_32x4d, resnext101_32x8d, resnext101_64x4d
ConvNext A ConvNet for the 2020s convnext_tiny, convnext_small, convnext_base, convnext_large
EfficientNet EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks efficientnet_b{0..7}
EfficientNetv2 EfficientNetV2: Smaller Models and Faster Training efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l
Swin Transformer Swin Transformer: Hierarchical Vision Transformer using Shifted Windows swin_t, swin_s, swin_b
Swin Transformerv2 Swin Transformer V2: Scaling Up Capacity and Resolution swin_v2_t, swin_v2_s, swin_v2_b

Tools

  1. Split the data set into training set and validation set
python tools/data_prepare.py --postfix <jpg or png> --root <input your data realpath> --frac <train segment ratio, eg: 0.9 0.6 0.3 0.9 0.9>
  1. Data augmented visualization
cd visiondk
python -m tools.test_augment

Contact Me

  1. If you enjoy reproducing papers and algorithms, welcome to pull request.
  2. If you have some confusion about the repo, please submit issues.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.