openmp-101's Introduction

OpenMP-101

Optimization Notice

0. Fast Guide: OMP in Caffe

0.1 What's the OMP

an easy, portable and scalable way to parallelize applications for many cores. – Multi-threaded, shared memory model (like pthreads)
a standard API
omp pragmas are supported by major C/C++ , Fortran compilers (gcc, icc, etc).

A lot of good tutorials on-line:

0.2 OpenMP programming model

0.3 Example

naive implementation

int main(int argc, char *argv[])
{
    int idx;
    float a[N], b[N], c[N];
    
    for(idx=0; idx<N; ++idx)
    {
        a[idx] = b[idx] = 1.0;
    }
    
    for(idx=0; idx<N; ++idx)
    {
        c[idx] = a[idx] + b[idx];
    }
}

omp implementation

#include <omp.h>
int main(int argc, char *argv[])
{
    int idx;
    float a[N], b[N], c[N];
    #pragma omp parallel for
    for(idx=0; idx<N; ++idx)
    {
        a[idx] = b[idx] = 1.0;
    }
    #pragma omp parallel for
    for(idx=0; idx<N; ++idx)
    {
        c[idx] = a[idx] + b[idx];
    }
}

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define N (100)
int main(int argc, char *argv[])
{
    int nthreads, tid, idx;
    float a[N], b[N], c[N];
    nthreads = omp_get_num_threads();
    printf("Number of threads = %d\n", nthreads);
    #pragma omp parallel for
    for(idx=0; idx<N; ++idx)
    {
        a[idx] = b[idx] = 1.0;
    }
    #pragma omp parallel for
    for(idx=0; idx<N; ++idx)
    {
        c[idx] = a[idx] + b[idx];
        tid = omp_get_thread_num();
        printf("Thread %d: c[%d]=%f\n", tid, idx, c[idx]);
    }
}

0.4 Compiling, linking etc

You need to add flag –fopenmp

# compile using gcc
gcc -fopenmp omp_vecadd.c -o vecadd

# compile using icc
icc -openmp omp_vecadd.c -o vecad

Control number of threads through set enviroment variable on command line:

export OMP_NUM_THREADS=8

0.5 Exercise

Implement

vector dot-product: c=<x,y>
matrix-matrix multiply
2D matrix convolution

Add openmp support to relu, and max-pooling layers

note

synch and critical sections,

use critical section to reduce false sharing

BUT don't put critical sections inside tight loops - doing so serializes things

0.6 Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs

improve_performance_for_deep_learning_frameworks_on_cpu

Tutorial1: Introduction to OpenMP

Intel’s Tim Mattson’s Introduction to OpenMP video tutorial is now available.

Outline:

Unit 1: Getting started with OpenMP

Module 1: Introduction to parallel programming
Module 2: The boring bits: Using an OpenMP compiler (hello world)
Discussion 1: Hello world and how threads work

Unit 2: The core features of OpenMP

Module 3: Creating Threads (the Pi program)
Discussion 2: The simple Pi program and why it sucks
Module 4: Synchronization (Pi program revisited)
Discussion 3: Synchronization overhead and eliminating false sharing
Module 5: Parallel Loops (making the Pi program simple)
Discussion 4: Pi program wrap-up

Unit 3: Working with OpenMP

Module 6: Synchronize single masters and stuff
Module 7: Data environment
Discussion 5: Debugging OpenMP programs
Module 8: Skills practice … linked lists and OpenMP
Discussion 6: Different ways to traverse linked lists

Unit 4: a few advanced OpenMP topics

Module 9: Tasks (linked lists the easy way)
Discussion 7: Understanding Tasks
Module 10: The scary stuff … Memory model, atomics, and flush (pairwise synch).
Discussion 8: The pitfalls of pairwise synchronization
Module 11: Threadprivate Data and how to support libraries (Pi again)
Discussion 9: Random number generators

Unit 5: Recapitulation

Thanks go to the University Program Office at Intel for making this tutorial available.

Tutorial2: OpenMP

Author: Blaise Barney, Lawrence Livermore National Laboratory

OpenMP

Tutorial3: OpenMP tutorial | Goulas Programming Soup

https://goulassoup.wordpress.com/2011/10/28/openmp-tutorial/

reference

lnarmour/omp-tutorial

openmp-101's People

Contributors

Stargazers

Watchers

openmp-101's Issues

POSIX-threads-programming-tutorials

Veinin/POSIX-threads-programming-tutorials: Summaries and notes on pthreads programming.
https://github.com/Veinin/POSIX-threads-programming-tutorials

OpenMP简易教程

OpenMP简易教程 - 百度文库
https://wenku.baidu.com/view/21348f16a21614791711280b.html

Survey Question on the For loop in source file pi/my_pi.c line 63

Hello Sir/ Madam
We are from a research group at Iowa State University, USA. We want to do a survey on Github developers on the methods they used for paralleling their code. To do the survey, We want to ask three questions:

Have you ever tried to add pragma for that 'for' loop?.
How much confidence do you have about the correctness of this implementation? You can choose from 1-5 with 1 as the lowest confidence score and 5 as the highest confidence.
(Optional) Do you actually run (interpret the code with compilation and pass input/get output) the code? Yes/No

If yes, can you provide the information of what are the input and expected output of this program (the input that caused the program to run through this for-loop).

The for loop is from line 63 of file https:/github.com/ysh329/OpenMP-101/blob/master/pi/my_pi.c
Here is a part of the code:

nan
for (i = start_step_num; i < finish_step_num; ++i)
{
x = (i + 0.5) * step;
sum = sum + (4.0 / (1.0 + (x * x)));
}

Sincerely thanks

omp parallel for raises segmentation fault(core dumped) when the number of iterations in the for loop exceed a certain amount.

I have been writing a code and have tried to parallelize it using #pragma omp parallel for. Everything works fine till about 100000 iterations but whe i increase the number of iterations to about 200000, it compiles without errors but while running shows the error "segmentation fault(core dumped) " immidiately.
I have been told that it might be local/global variable issue of the iteration variables but i think it is unlikely because it runs just fine if the number of iteration is lesser.
Also i have been told that it might be stack issue which i have no idea about

Recommend Projects

ysh329 / openmp-101 Goto Github PK