OpenMp Tutorial
简单易用的多线程工具
What is openmp
easy multithreading programing for c++.
it is a simple C/C++/Fortran complier extension that allows to add parallelism into existing source code without significantly having to rewrite it.
Example
example for init an array
#include<iostream>
#include<vector>
int main()
{
vector<int> arr;
arr.reserve(1000);
#pargma omp parallel for
for(int i=0;i<1000;i++)
{
arr[i] = i * 2;
}
return 0;
}
you can compile it like this
g++ tmp.cpp -fopenmp
if you remove the #pragma lines,the result is still a valid C++ program that runs and dose the expected thing.
if the compiler encounters a #pragma that it dose not support,it will ignore it.
The syntax
the parallel construct
it creates a team of N threads(where N is the number of CPU cores by default) . after the statement, the threads join back into one.
#pragma omp parallel
{
// code inside thie region runs in parallel
printf("Hello\n");
}
Loop construct: for
the for construct splits the for-loop so that each thread in the current team handles a different portion of the loop
#pragma omp for
for(int n=0;n<10;++n)
{
printf(" %d",n);
}
Note:#pragma omp for onlt delegates portions of the loop for different threads in current team. a team is the group of threads excuting the program.At program start,the team only consists the main thread.
To create a new team of threads,you need to specify the parallel keyword
#pragma omp parallel
{
#pragma omp for
for(int n=0;n<10;n++) printf(" %d",n)
}
or use
#pragma omp parallel for
you can explicitly specify the number of threads to be created in the team. using num_threads
#pragma omp parallel for num_threads(3)
scheduling
The scheduling algorithm for the for-loop can explicity controlled
default
#pargma omp for schedule(static)
in the dynamic schedule,each thread ask the omp runtime library for an iteration number,then handles it.
the chunk size can also be specified to lessen the number of calls to the runtime library
#pargma omp parallel for schedule(dynamic,3)
the ordered clause
it is possible to force that certain events within the loop happen in a predicted order, using ordered clause
#pargma omp parallel for ordered shcedule(dynamic)
for(int n = 0;i < 100;i++)
{
files[n].compress();
#pragma omp ordered
send(files[n]);
}
the collapse clause
when you have nested loops.you can use collapse
#pargma omp parallel for collapse(2)
for(int i = 0;i < 10;i++)
{
for(int j = 0;j < 10;j++)
{
doSth();
}
}
section
sometimes,it is handy to indicate that "this and this can run in parallel" the sectiongs is just for that
#pragma omp parallel sections
{
{
work1();
}
#pragma omp section
{
work2();
work3();
}
#pragma omp section
{
work4();
}
}
This code indicates that any of tasks work1,work2+work3,work4 can run in parallel.
Thread-safety
Atomicity
#pragma omp atomic
couter += value;
the atomic keyword in OpenMP specifies that denoted action happens atomically.
atomic read expression
#pragma omp atomic read
var = x;
atomic write expression
#pragma omp atomic write
x = expr;
atomic update expression
#pragma omp atomic update
++x;--x;x++;x--;
+=,-= ...
atomic capture expression
capture expression combine the read and update features
#pragma omp atomic capture
var = x++;
the critical construct
the critical construct restricts the execution of the associated statement / block to a single thread at a time
#pragma omp critical
{
doSth();
}
Note:the critical section names are global to the entire program.
locks
The openmp runtime library provides a lock type,omp_lock_t in omp.h
the lock type has five manipulator functions
omp_init_lock : initializes the lock
omp_destory_lock : the lock must be unset before the call
omp_set_lock: get the lock
omp_unset_lock: release the lock
omp_test_lock: attempts to set the lock.if the lock is already set by another thread it returns 0;if it managed to set the lock,it return 1
the flush directive
Even when variables used by threads are supposed to be shared,the compiler may take liberties and optimize them as register variables. This can skew concurrent observations of variable. The flush directive can be used to forbid this.
/*first thread*/
b=1;
#pragma omp flush(a,b)
if(a == 0)
{
/* critical section*/
}
/*second thread*/
a = 1;
#pragma omp flush(a,b)
if(b==0)
{
/* critical section*/
}
Controlling which data to share between threads
int a,b =0;
#pragma omp parallel for private(a) shared(b)
for(a=0;a<50;++a)
{
#pragma omp atomic
b += a;
}
a is private(each thread has their own copy of it) and b is shared(each thread accesses the same variable)
the difference between private and firstprivate
private does not copy the value of the variable that was in the surrounding context.
#include<string>
#include<iostream>
int main()
{
std::string a = "x",b="y";
#pragma omp parallel private(a,c) shared(b) num_threads(2)
{
a+="k";
c+= 7;
std::cout << "A is " << a <<", b is "<< b;
}
}
// eaquls this below
OpenMP_thread_fork(2);
{ // Start new scope
std::string a; // Note: It is a new local variable.
int c; // This too.
a += "k";
c += 7;
std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
} // End of scope for the local variables
OpenMP_join();
If you actually need a copy of the original value, use the firstprivate clause instead.
#include <string>
#include <iostream>
int main()
{
std::string a = "x", b = "y";
int c = 3;
#pragma omp parallel firstprivate(a,c) shared(b) num_threads(2)
{
a += "k";
c += 7;
std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
}
}
Execution synchronization
the barrier directive and the nowait clause
The barrier directive causes threads encoutering the barrier to wait until all the other threads in the same team have encountered the barrier.
#pragma omp parrllel
{
// all threads execute this
SomeCode();
#pragma omp barrier
// all threads execute this,but not before all threads have finished executing SomeCode()
SomeMoreCode();
}
Note:there is an implicit barrier at the end of each parallel block,at the end of each section for statement
#pragma omp parallel
{
#pragma omp for
for(int n=0; n<10; ++n) Work();
// This line is not reached before the for-loop is completely finished
SomeMoreCode();
}
// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();
#pragma omp parallel
{
#pragma omp for nowait
for(int n=0; n<10; ++n) Work();
// This line may be reached while some threads are still executing the for-loop.
SomeMoreCode();
}
// This line is reached only after all threads from
// the previous parallel block are finished.
CodeContinues();
the single and master constructs
The single construct specifies that the given statement/block is executed by only one thread. It is unspecified which thread. Other threads skip the statement/block and wait at an implicit barrier at the end of the construct.
#pragma omp parallel
{
Work1();
#pragma omp single
{
Work2();
}
Work3();
}
The master construct is similar, except that the statement/block is run by the master thread, and there is no implied barrier; other threads skip the construct without waiting.
static const char* FindAnyNeedle(const char* haystack, size_t size, char needle)
{
const char* result = haystack+size;
#pragma omp parallel
{
unsigned num_iterations=0;
#pragma omp for
for(size_t p = 0; p < size; ++p)
{
++num_iterations;
if(haystack[p] == needle)
{
#pragma omp atomic write
result = haystack+p;
// Signal cancellation.
#pragma omp cancel for
}
// Check for cancellations signalled by other threads:
#pragma omp cancellation point for
}
// All threads reach here eventually; sooner if the cancellation was signalled.
printf("Thread %u: %u iterations completed\n", omp_get_thread_num(), num_iterations);
}
return result;
}
Loop nesting
this code will not do the excepted thing
#pragma omp parallel for
for(int y=0; y<25; ++y)
{
#pragma omp parallel for
for(int x=0; x<80; ++x)
{
tick(x,y);
}
}
solution
#pragma omp parallel for collapse(2)
for(int y=0; y<25; ++y)
{
for(int x=0; x<80; ++x)
{
tick(x,y);
}
}
Readmore
http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://en.wikipedia.org/wiki/OpenMP