Git Product home page Git Product logo

tirumalnaidu / opencl-hls-cnn-accelerator Goto Github PK

View Code? Open in Web Editor NEW
70.0 5.0 11.0 50.67 MB

OpenCL HLS based CNN Accelerator on Intel DE10 Nano FPGA.

Home Page: https://docs.google.com/presentation/d/16nk5-PTvlzjRUD0g9FnyuJoT0aaJMx6uuWkDMwTDdCg/edit?usp=sharing

License: GNU General Public License v3.0

Jupyter Notebook 4.38% C 94.80% Python 0.22% C++ 0.59% Makefile 0.01%
fpga-accelerator opencl de10-nano darknet-image-classification altera-opencl-sdk fpga intel-fpga-opencl neural-network-accelerator cnn-acceleration

opencl-hls-cnn-accelerator's Introduction

About

We designed a Neural Network Accelerator for Darknet Reference Model (which is 2.9 times faster than AlexNet and attains the same top-1 and top-5 performance as AlexNet but with 1/10th the parameters) for image classification on Imagenet Dataset.

Table of Contents

Board

Requirements

Files

  • pytorch_model - We used a CNN based on Darknet Framework. So, we had to implemented the model in PyTorch Framework to check the results and collect the model parameters
  • pyopencl_model - To simulate and verify the kernels we wrote in OpenCL, we used PyOpenCL package and it worked with same accuracy as PyTorch model and acheived about 20x speed than PyTorch model.
  • model - This folder contains the pre-trained model parameters of darknet reference model of each layer in seperate txt file.

CNN Architecture

Layer Filters Kernel Size Stride Pad Input Size Output Size
1 conv 16 3 x 3 1 1 256 x 256 x 3 256 x 256 x 16
2 max - 2 x 2 2 0 256 x 256 x 16 128 x 128 x 16
3 conv 32 3 x 3 1 1 128 x 128 x 16 128 x 128 x 32
4 max - 2 x 2 2 0 128 x 128 x 32 64 x 64 x 32
5 conv 64 3 x 3 1 1 64 x 64 x 32 64 x 64 x 64
6 max - 2 x 2 2 0 64 x 64 x 64 32 x 32 x 64
7 conv 128 3 x 3 1 1 32 x 32 x 64 32 x 32 x 128
8 max - 2 x 2 2 0 32 x 32 x 128 16 x 16 x 128
9 conv 256 3 x 3 1 1 16 x 16 x 128 16 x 16 x 256
10 max - 2 x 2 2 0 16 x 16 x 256 8 x 8 x 256
11 conv 512 3 x 3 1 1 8 x 8 x 256 8 x 8 x 512
12 max - 2 x 2 2 0 8 x 8 x 512 4 x 4 x 512
13 conv 1024 3 x 3 1 1 4 x 4 x 512 4 x 4 x 1024
14 avg - 4 x 4 1 0 4 x 4 x 1024 1 x 1 x 1024
15 conv 1000 1 x 1 1 0 1 x 1 x 1024 1 x 1 x 1000

Results

Conv 0  time: 35.898 ms                                                         
Conv 2  time: 79.748 ms                                                         
Conv 4  time: 79.439 ms                                                         
Conv 6  time: 79.442 ms                                                         
Conv 8  time: 79.418 ms                                                         
Conv 10 time: 79.411 ms                                                         
Conv 12 time: 79.404 ms                                                         
Conv 14 time: 17.319 ms                                                         
Total Convolution time: 530.079 ms

Batchnorm 0   time: 143.092 ms                                                  
Batchnorm 2   time: 73.007 ms                                                   
Batchnorm 4   time: 21.486 ms                                                   
Batchnorm 6   time: 5.504 ms                                                    
Batchnorm 8   time: 2.479 ms                                                    
Batchnorm 10  time: 1.259 ms                                                    
Batchnorm 12  time: 0.641 ms                                                    
Batchnorm 14  time: 0.052 ms                                                    
Total Batchnorm time: 247.520 ms   

Maxpool 1  time: 78.848 ms                                                      
Maxpool 3  time: 31.823 ms                                                      
Maxpool 5  time: 8.991 ms                                                       
Maxpool 7  time: 2.890 ms                                                       
Maxpool 9  time: 1.486 ms                                                       
Maxpool 11  time: 0.719 ms                                                      
Maxpool 13  time: 0.286 ms                                                      
Total Pooling time: 125.042 ms                                                  
                                                                                
Total Time: 902.642 ms                                                            
                                                                                
Label   : Egyptian cat                                                          
Accuracy: 35.796 % 

Resource Usage

Kernel ALUTs FFs RAMs DSPs
conv 27822 28705 144 58
batch norm 9949 12211 93 10
pool 8247 10211 36 24
conv1x1 12184 15087 102 17
Total 61872 (56%) 73190 (33%) 405 (79%) 109 (97%)

Planned Improvements

We can further improve the throughput of the accelerator by converting the model to fixed point (8-bit or 16-bit) and pipelining the accelerator by using Intel channels and pipes extension.

License

MIT License

opencl-hls-cnn-accelerator's People

Contributors

dependabot[bot] avatar jithendray avatar suryacharanp avatar tirumalnaidu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

opencl-hls-cnn-accelerator's Issues

How to run model in board?

Hi all,
I'm a newbie and just researched YOLO in FPGA for few weeks. I'm trying to implement tiny YOLO v2 in De1-SoC board and I found your repo but I don't see any guide to run your model in board. Could you give me some guide and what do i need in board to run your model?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.