OpenMp Tutorial

· 2018-03-28 · # OpenMP

简单易用的多线程工具

What is openmp

easy multithreading programing for c++.

it is a simple C/C++/Fortran complier extension that allows to add parallelism into existing source code without significantly having to rewrite it.

Example

example for init an array

#include<iostream>
#include<vector>
int main()
{
    vector<int> arr;
    arr.reserve(1000);
    #pargma omp parallel for
    for(int i=0;i<1000;i++)
    {
        arr[i] = i * 2;
    }
    return 0;
}

you can compile it like this

g++ tmp.cpp -fopenmp

if you remove the #pragma lines,the result is still a valid C++ program that runs and dose the expected thing.
if the compiler encounters a #pragma that it dose not support,it will ignore it.

The syntax

the parallel construct

it creates a team of N threads(where N is the number of CPU cores by default) . after the statement, the threads join back into one.

#pragma omp parallel
{
    // code inside thie region runs in parallel
    printf("Hello\n");
}

Loop construct: for

the for construct splits the for-loop so that each thread in the current team handles a different portion of the loop

#pragma omp for
for(int n=0;n<10;++n)
{
    printf(" %d",n);
}

Note:#pragma omp for onlt delegates portions of the loop for different threads in current team. a team is the group of threads excuting the program.At program start,the team only consists the main thread.
To create a new team of threads,you need to specify the parallel keyword

#pragma omp parallel
{
    #pragma omp for
    for(int n=0;n<10;n++) printf(" %d",n)
}

or use

#pragma omp parallel for

you can explicitly specify the number of threads to be created in the team. using num_threads

#pragma omp parallel for num_threads(3)

scheduling

The scheduling algorithm for the for-loop can explicity controlled

default

#pargma omp for schedule(static)

in the dynamic schedule,each thread ask the omp runtime library for an iteration number,then handles it.
the chunk size can also be specified to lessen the number of calls to the runtime library

#pargma omp parallel for schedule(dynamic,3)

the ordered clause

it is possible to force that certain events within the loop happen in a predicted order, using ordered clause

#pargma omp parallel for ordered shcedule(dynamic)
for(int n = 0;i < 100;i++)
{
    files[n].compress();
    #pragma omp ordered
    send(files[n]);
}

the collapse clause

when you have nested loops.you can use collapse

#pargma omp parallel for collapse(2)
for(int i = 0;i < 10;i++)
{
    for(int j = 0;j < 10;j++)
    {
        doSth();
    }
}

section

sometimes,it is handy to indicate that "this and this can run in parallel" the sectiongs is just for that

#pragma omp parallel sections
{
    {
        work1();
    }
    #pragma omp section
    {
        work2();
        work3();
    }
    #pragma omp section
    {
        work4();
    }
}

This code indicates that any of tasks work1,work2+work3,work4 can run in parallel.

Thread-safety

Atomicity

#pragma omp atomic
couter += value;

the atomic keyword in OpenMP specifies that denoted action happens atomically.

atomic read expression

#pragma omp atomic read
var = x;

atomic write expression

#pragma omp atomic write
x = expr;

atomic update expression

#pragma omp atomic update
++x;--x;x++;x--;
+=,-= ...

atomic capture expression

capture expression combine the read and update features

#pragma omp atomic capture
var = x++;

the critical construct

the critical construct restricts the execution of the associated statement / block to a single thread at a time

#pragma omp critical
{
    doSth();
}

Note:the critical section names are global to the entire program.

locks

The openmp runtime library provides a lock type,omp_lock_t in omp.h
the lock type has five manipulator functions

omp_init_lock : initializes the lock
omp_destory_lock : the lock must be unset before the call
omp_set_lock: get the lock
omp_unset_lock: release the lock
omp_test_lock: attempts to set the lock.if the lock is already set by another thread it returns 0;if it managed to set the lock,it return 1

the flush directive

Even when variables used by threads are supposed to be shared,the compiler may take liberties and optimize them as register variables. This can skew concurrent observations of variable. The flush directive can be used to forbid this.

/*first thread*/
b=1;
#pragma omp flush(a,b)
if(a == 0)
{
    /* critical section*/

}
/*second thread*/
a = 1;
#pragma omp flush(a,b)
if(b==0)
{
   /* critical section*/
}

int a,b =0;
#pragma omp parallel for private(a) shared(b)
for(a=0;a<50;++a)
{
    #pragma omp atomic
    b += a;
}

a is private(each thread has their own copy of it) and b is shared(each thread accesses the same variable)

the difference between private and firstprivate
private does not copy the value of the variable that was in the surrounding context.

#include<string>
#include<iostream>
int main()
{
    std::string a = "x",b="y";
    #pragma omp parallel private(a,c) shared(b) num_threads(2)
    {
        a+="k";
        c+= 7;
        std::cout << "A is " << a <<", b is "<< b;
    }

}
// eaquls this below
     OpenMP_thread_fork(2);
     {                  // Start new scope
         std::string a; // Note: It is a new local variable.
         int c;         // This too.
         a += "k";
         c += 7;
         std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
     }                  // End of scope for the local variables
     OpenMP_join();

If you actually need a copy of the original value, use the firstprivate clause instead.

#include <string>
#include <iostream>

int main()
{
    std::string a = "x", b = "y";
    int c = 3;

    #pragma omp parallel firstprivate(a,c) shared(b) num_threads(2)
    {
        a += "k";
        c += 7;
        std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
    }
}

Execution synchronization

the barrier directive and the nowait clause

The barrier directive causes threads encoutering the barrier to wait until all the other threads in the same team have encountered the barrier.

#pragma omp parrllel
{
    // all threads execute this
    SomeCode();
    #pragma omp barrier
    // all threads execute this,but not before all threads have finished executing SomeCode()
    SomeMoreCode();
}

Note:there is an implicit barrier at the end of each parallel block,at the end of each section for statement

#pragma omp parallel
{
    #pragma omp for
    for(int n=0; n<10; ++n) Work();

    // This line is not reached before the for-loop is completely finished
    SomeMoreCode();
}

// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();

#pragma omp parallel
{
    #pragma omp for nowait
    for(int n=0; n<10; ++n) Work();

    // This line may be reached while some threads are still executing the for-loop.
    SomeMoreCode();
}

// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();

the single and master constructs

The single construct specifies that the given statement/block is executed by only one thread. It is unspecified which thread. Other threads skip the statement/block and wait at an implicit barrier at the end of the construct.

#pragma omp parallel
{
    Work1();
    #pragma omp single
    {
        Work2();
    }
    Work3();
}

The master construct is similar, except that the statement/block is run by the master thread, and there is no implied barrier; other threads skip the construct without waiting.

static const char* FindAnyNeedle(const char* haystack, size_t size, char needle)
{
    const char* result = haystack+size;
    #pragma omp parallel
    {
        unsigned num_iterations=0;
        #pragma omp for
        for(size_t p = 0; p < size; ++p)
        {
            ++num_iterations;
            if(haystack[p] == needle)
            {
                #pragma omp atomic write
                result = haystack+p;
                // Signal cancellation.
                #pragma omp cancel for
            }
        // Check for cancellations signalled by other threads:
        #pragma omp cancellation point for
        }
    // All threads reach here eventually; sooner if the cancellation was signalled.
    printf("Thread %u: %u iterations completed\n", omp_get_thread_num(), num_iterations);
    }
    return result;
}

Loop nesting

this code will not do the excepted thing

#pragma omp parallel for
for(int y=0; y<25; ++y)
{
    #pragma omp parallel for
    for(int x=0; x<80; ++x)
    {
        tick(x,y);
    }
}

solution

#pragma omp parallel for collapse(2)
for(int y=0; y<25; ++y)
{
    for(int x=0; x<80; ++x)
    {
        tick(x,y);
    }
}

http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://en.wikipedia.org/wiki/OpenMP