Git Product home page Git Product logo

eigen_value's Introduction

Warning

I've stopped maintaining this project !

eigen_value

Parallel Eigen value(s)/ vector(s) Computation on Accelerator

Background

This is an implementation of maximum eigen value( with respective vector ) computation using Similarity Transform method for positive matrices.

Read paper for better understanding.

I keep both sequential & parallel implementation.

Requirements

  • You should have Intel oneAPI basekit installed, which you can find here.
  • Or you may want to build it from source, check here.
  • I'm working on a GNU/ Linux machine, with make, clang-format installed.
$ lsb_release -d
Description:    Ubuntu 20.04.3 LTS
  • My dpcpp compiler version is
$ dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2021.4.0 (2021.4.0.20210924)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.4.0/linux/bin

Benchmark

  • Assuming you've all requirements installed on your machine, compile program
make # or make aot_{cpu,gpu}
  • Run produced binary
./run # or ./a.out

Clean up generated object files using make clean

After editing source, you can reformat those using make format

Benchmark Results

I ran parallel implementation of similarity transform algorithm on multiple hardwares, with hilbert matrix of various dimensions, while setting maximum iteration count to 1000 and dynamic work group size.

Except following ones, I keep detailed benchmark results for all kernels involved in similarity transform method here.

On CPU

$ make aot_cpu && ./a.out

running on Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			         7 ms			     9 round(s)
256  x  256			         1 ms			    10 round(s)
512  x  512			         3 ms			    12 round(s)
1024 x 1024			         5 ms			    13 round(s)
2048 x 2048			        18 ms			    14 round(s)
4096 x 4096			       101 ms			    15 round(s)
8192 x 8192			       510 ms			    17 round(s)
$ make aot_cpu && ./a.out

running on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			        22 ms			     9 round(s)
256  x  256			         6 ms			    10 round(s)
512  x  512			         7 ms			    12 round(s)
1024 x 1024			         8 ms			    13 round(s)
2048 x 2048			        19 ms			    14 round(s)
4096 x 4096			        40 ms			    15 round(s)
8192 x 8192			       126 ms			    17 round(s)
$ make aot_cpu && ./a.out

running on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			        11 ms			     9 round(s)
256  x  256			         6 ms			    10 round(s)
512  x  512			        16 ms			    12 round(s)
1024 x 1024			        51 ms			    13 round(s)
2048 x 2048			       200 ms			    14 round(s)
4096 x 4096			       847 ms			    15 round(s)
8192 x 8192			      3759 ms			    17 round(s)
$ make aot_cpu && ./a.out

running on Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			        18 ms			     9 round(s)
256  x  256			         4 ms			    10 round(s)
512  x  512			         6 ms			    12 round(s)
1024 x 1024			         8 ms			    13 round(s)
2048 x 2048			        27 ms			    14 round(s)
4096 x 4096			        82 ms			    15 round(s)
8192 x 8192			       339 ms			    17 round(s)

On GPU

$ make aot_gpu && ./a.out

running on Intel(R) Iris(R) Xe MAX Graphics [0x4905]

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			        10 ms			     9 round(s)
256  x  256			         8 ms			    10 round(s)
512  x  512			        14 ms			    12 round(s)
1024 x 1024			        37 ms			    13 round(s)
2048 x 2048			       138 ms			    14 round(s)
4096 x 4096			       564 ms			    15 round(s)
8192 x 8192			      2509 ms			    17 round(s)
$ make && ./run # JIT compiled kernels, check time column of first row of below table [ way more than second row, because kernel being JIT-ed ]

running on Intel(R) UHD Graphics P630 [0x3e96]

Parallel Algorithm using Similarity Transform for finding max eigen value (with vector)

128  x  128			       122 ms			     9 round(s)
256  x  256			         9 ms			    10 round(s)
512  x  512			        28 ms			    12 round(s)
1024 x 1024			       104 ms			    13 round(s)
2048 x 2048			       446 ms			    14 round(s)
4096 x 4096			      1858 ms			    15 round(s)
8192 x 8192			      8259 ms			    17 round(s)

Test

A set of minimal test cases are present, which you may want to run using

make test

Sequential (reference) Implementation

Run sequential implementation using 馃憞, while ensuring you've numpy installed. It also compares computed maximum eigen value with numpy computed result.

python3 -m pip install -U numpy # you may need it
python3 main.py
Sequential Similarity Transform, for finding maximum eigen value ( with vector )

32   x   32               1.27 ms                      5 round(s)
64   x   64               2.44 ms                      5 round(s)
128  x  128               9.67 ms                      5 round(s)
256  x  256              20.58 ms                      4 round(s)
512  x  512              74.24 ms                      4 round(s)
1024 x 1024             335.77 ms                      4 round(s)

Python Wrapper

I provide you with one build recipe which can be used for compiling Parallel Similarity Transform's implementation into dynamically linked shared object.

# AOT compiled for x86_64 CPU; check Makefile
make lib

# now check 
file wrapper/libsimilarity_transform.so

Once shared object is ready, you can now interact with C++ implementation of Maximum EigenValue Finder function from Python wrapper function.

pushd wrapper/python
python3
import similarity_transform as st

# must be positive square matrix of `float32`
#
# this is input data matrix
m = st.np.random.random((16, 16)).astype('f')

ev = st.EigenValue()
, v, ts, itr = ev.similarity_transform(m)

# 位 = maximum eigen value
# v = eigen vector
# ts = execution time ( in ms )
# itr = iterations required before convergence

You may want to take a look at test case written using Python wrapper.

There's also one script for running tests on randomly generated positive square matrices.

python3 test.py
popd

eigen_value's People

Contributors

itzmeanjan avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.