Our free open source linear algebra library ViennaCL has three computing backends: One based on CUDA, one based on OpenCL, and one based on OpenMP. While the CUDA and OpenCL backends provide high performance, this is not yet the case with the OpenMP backend. Although the OpenMP-backend was initially introduced as a fall-back mechanism for CPU-only systems, it is now mature enough to be tuned for high performance. The student will tune the individual linear algebra kernels (vector operations, matrix-vector products, etc.) for best performance.
Squeezing the last bit of performance out of recent hardware is a lot of fun.
Also, the student will learn a lot about how multi-core CPUs really work and the many tricks needed to get good performance.
Certain algorithms cannot be implemented efficiently on CPUs with only OpenCL, so having an efficient OpenMP compute backend available will be an enabler for many high-performance implementations both within ViennaCL and derived by our users.
Moderate C or C++ skills are required. Experience in using OpenMP is a plus.
Karl Rupp
Write a C or C++ program using OpenMP which computes the dense matrix-matrix product B = A * A for a double-precision floating point square matrix A in row-major storage (i.e. the element [i,j] is located at i*N + j in the underlying array) as fast as possible. The size of A should be a command line parameter of your program. Plot the obtained performance for matrix sizes between 10 and 2000 in dependence of the number of threads and comment on the results.
Contact rupp_AT_iue.tuwien.ac.at or stop by at the institute if you have questions. Submit the code with your application.