Git Product home page Git Product logo

strassens-matmul-cuda's Introduction

Strassens-MatMul-CUDA

Title: Parallelizing Strassen's Matrix-Multiplication Algorithm with CUDA

Project Overview: This project aims to implement and parallelize Strassen's Matrix-Multiplication Algorithm using CUDA, a high-performance parallel computing platform and application programming interface (API). The algorithm efficiently computes the product matrix C from two matrices A and B, each of size n x n. The standard algorithm for matrix multiplication has a time complexity of O(n^3), while Strassen's algorithm reduces it to approximately O(n^2.8) operations.

Standard algorithm

for i in range(1, n + 1):
    for j in range(1, n + 1):
        for k in range(1, n + 1):
            𝐢𝑖𝑗 += π΄π‘–π‘˜ * π΅π‘˜π‘—

Strassen's Algorithm: Strassen's algorithm divides matrices A, B, and C into equal-sized blocks, leading to a recursive approach for matrix multiplication. The product matrix C is computed using seven matrix multiplications and eighteen matrix additions, significantly reducing the number of multiplicative operations compared to the standard algorithm.

Strassen's Algorithm

Project Implementation: This project will implement Strassen's Matrix-Multiplication Algorithm using CUDA, leveraging the parallel processing power of GPUs. CUDA will be used to optimize the algorithm's performance by parallelizing the matrix multiplication and addition operations. The project will also consider terminating the recursion at a specific level (k') to handle smaller matrix sizes using the standard matrix multiplication approach.

The primary goals of this project are to:

  • Implement Strassen's algorithm in CUDA for parallel matrix multiplication.
  • Optimize the algorithm for large matrix sizes.
  • Explore and measure the performance improvements achieved through parallelization.
  • Provide a clear and well-documented codebase for others to understand and use.

This project aims to harness the power of GPUs to enhance the efficiency of matrix multiplication and may serve as a valuable tool for various scientific and computational applications.

Compilation and Execution

To compile the code, follow these steps:

make

./MatMul.exe <k> <k'>

The size of the matrix (n x n) is determined by n = 2k, and the size of the terminal matrix (s x s) is determined by s = n /2k'

Speedup vs k

The figure above illustrates the speedup achieved by the CUDA-based Strassen's algorithm in comparison to the CPU-based algorithm for different values of k with k'= 1. It is evident that the speedup remains below 1 for matrix sizes less than 32x32, indicating that the CPU-based algorithm outperforms the GPU-based algorithm in such scenarios. This is primarily because the overhead cost of data transfer becomes significant compared to the actual processing time for smaller matrices. In contrast, the CPU-based algorithm does not require data transfer since all data is already in the CPU memory. However, as the matrix size surpasses this threshold, the difference between the CPU and GPU implementations becomes noticeable for parallel matrix multiplication. The speedup increases as larger matrices can leverage the parallel hardware capabilities of the GPU more effectively. For instance, the speedup started at 4.928336x for k=6 and increased all the way to 5364.372x for k=12.

strassens-matmul-cuda's People

Contributors

susavlsh10 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.