User Tools

Site Tools


ViennaCL: Optimization of the OpenMP backend


Our free open source linear algebra library ViennaCL has three computing backends: One based on CUDA, one based on OpenCL, and one based on OpenMP. While the CUDA and OpenCL backends provide high performance, this is not yet the case with the OpenMP backend. Although the OpenMP-backend was initially introduced as a fall-back mechanism for CPU-only systems, it is now mature enough to be tuned for high performance. The student will tune the individual linear algebra kernels (vector operations, matrix-vector products, etc.) for best performance.

Benefit for the Student

Squeezing the last bit of performance out of recent hardware is a lot of fun. :-) Also, the student will learn a lot about how multi-core CPUs really work and the many tricks needed to get good performance.

Benefit for the Project

Certain algorithms cannot be implemented efficiently on CPUs with only OpenCL, so having an efficient OpenMP compute backend available will be an enabler for many high-performance implementations both within ViennaCL and derived by our users.


Moderate C or C++ skills are required. Experience in using OpenMP is a plus.

Primary Mentor

Karl Rupp


Write a C or C++ program using OpenMP which computes the dense matrix-matrix product B = A * A for a double-precision floating point square matrix A in row-major storage (i.e. the element [i,j] is located at i*N + j in the underlying array) as fast as possible. The size of A should be a command line parameter of your program. Plot the obtained performance for matrix sizes between 10 and 2000 in dependence of the number of threads and comment on the results.

Contact or stop by at the institute if you have questions. Submit the code with your application.

2015-viennacl-openmp.txt · Last modified: 2015/03/24 20:40 by viennastar