Git Product home page Git Product logo

eatformer's Introduction


Official PyTorch implementation of "EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm" that improves our previous work "Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model, NeurIPS'21", Code.

Abstract Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derives that both have consistent mathematical formulation. Then inspired by effective EA variants, we propose a novel pyramid EATFormer backbone that only contains the proposed EA-based Transformer (EAT) block, which consists of three residual parts, \ie, Multi-Scale Region Aggregation (MSRA), Global and Local Interaction (GLI), and Feed-Forward Network (FFN) modules, to model multi-scale, interactive, and individual information separately. Moreover, we design a Task-Related Head (TRH) docked with transformer backbone to complete final information fusion more flexibly and improve a Modulated Deformable MSA (MD-MSA) to dynamically model irregular locations. Massive quantitative and quantitative experiments on image classification, downstream tasks, and explanatory experiments demonstrate the effectiveness and superiority of our approach over State-Of-The-Art (SOTA) methods. \Eg, our Mobile (1.8M), Tiny (6.1M), Small (24.3M), and Base (49.0M) models achieve 69.4, 78.4, 83.1, and 83.9 Top-1 only trained on ImageNet-1K with naive training recipe; EATFormer-Tiny/Small/Base armed Mask-R-CNN obtain 45.4/47.4/49.0 box AP and 41.4/42.9/44.2 mask AP on COCO detection, surpassing contemporary MPViT-T, Swin-T, and Swin-S by 0.6/1.4/0.5 box AP and 0.4/1.3/0.9 mask AP separately with less FLOPs; Our EATFormer-Small/Base achieve 47.3/49.3 mIoU on ADE20K by Upernet that exceeds Swin-T/S by 2.8/1.7.

Main results

Image Classification for ImageNet-1K:

Model & Url Params.
(M)
FLOPs
(G)
Throughput
(V100 GPU)
Throughput
(Xeon 8255C @ 2.50GHz CPU)
Image Size Top-1
EATFormer-Mobile 1.8 0.36 3926 456.3 224 x 224 69.4
EATFormer-Lite 3.5 0.91 2168 246.3 224 x 224 75.4
EATFormer-Tiny 6.1 1.41 1549 167.5 224 x 224 78.4
EATFormer-Mini 11.1 2.29 1055 122.1 224 x 224 80.9
EATFormer-Small 24.3 4.32 615 73.3 224 x 224 83.1
EATFormer-Medium 39.9 7.05 425 53.4 224 x 224 83.6
EATFormer-Base 49.0 8.94 329 43.7 224 x 224 83.9

Object Detection and Instance Segmentation Based on Mask R-CNN for COCO2017:

Backbone Box mAP (1x) Mask mAP (1x) Box mAP (MS+3x) Mask mAP (MS+3x) Params. FLOPs
EATFormer-Tiny 42.3 39.0 45.4 41.4 25M 198G
EATFormer-Small 46.1 41.9 47.4 42.9 44M 258G
EATFormer-Base 47.2 42.8 49.0 44.2 68M 349G

Semantic Segmentation Based on Upernet for ADE20k:

Backbone mIoU Params. FLOPs
EATFormer-Tiny 44.5 34M 870G
EATFormer-Small 47.3 53M 934G
EATFormer-Base 49.3 79M 1030G

Get Started

EATFormer models and train/test codes will be released soon ...

eatformer's People

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.