This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
2015-viennacl-openmp [2015/03/09 13:04] viennastar |
2015-viennacl-openmp [2015/03/24 20:40] viennastar Added puzzle |
||
---|---|---|---|
Line 4: | Line 4: | ||
=== Description === | === Description === | ||
- | ViennaCL has three computing backends: One based on CUDA, one based on OpenCL, and one based on OpenMP. While the CUDA and OpenCL backends provide high performance, this is not yet the case with the OpenMP backend. Although the OpenMP-backend was initially introduced as a fall-back mechanism for CPU-only systems, it is now mature enough to be tuned for high performance. The student will tune the individual linear algebra kernels (vector operations, matrix-vector products, etc.) for best performance. | + | Our free open source linear algebra library [[http://viennacl.sourceforge.net|ViennaCL]] has three computing backends: One based on [[http://www.nvidia.com/object/cuda_home_new.html|CUDA]], one based on [[https://www.khronos.org/opencl/|OpenCL]], and one based on [[http://openmp.org/|OpenMP]]. While the CUDA and OpenCL backends provide high performance, this is not yet the case with the OpenMP backend. Although the OpenMP-backend was initially introduced as a fall-back mechanism for CPU-only systems, it is now mature enough to be tuned for high performance. The student will tune the individual linear algebra kernels (vector operations, matrix-vector products, etc.) for best performance. |
=== Benefit for the Student === | === Benefit for the Student === | ||
Line 16: | Line 16: | ||
Moderate C or C++ skills are required. Experience in using OpenMP is a plus. | Moderate C or C++ skills are required. Experience in using OpenMP is a plus. | ||
- | === Mentors === | + | === Primary Mentor === |
- | Karl Rupp, Josef Weinbub | + | Karl Rupp |
+ | === Puzzles === | ||
+ | Write a C or C++ program using OpenMP which computes the dense matrix-matrix product B = A * A for a double-precision floating point square matrix A in row-major storage (i.e. the element [i,j] is located at i*N + j in the underlying array) as fast as possible. The size of A should be a command line parameter of your program. Plot the obtained performance for matrix sizes between 10 and 2000 in dependence of the number of threads and comment on the results. | ||
+ | |||
+ | Contact rupp_AT_iue.tuwien.ac.at or stop by at the institute if you have questions. Submit the code with your application. | ||