The solution of the Boltzmann transport equation via the Spherical Harmonics Expansion (SHE) method is in several aspects superior to the traditional and well-established stochastic Monte Carlo method. While large memory requirements have long prohibited spatially two-dimensional device simulations, modern computers are now able to provide sufficient memory.
It has been demonstrated that spherical harmonics expansions of the order five to seven are necessary to obtain accurate values for macroscopic quantities such as carrier velocities. Unfortunately, uniform SHE show a quadratic dependency on the expansion order L. As a consequence, the required expansion orders three, five and seven lead to considerably higher memory requirements and computational effort when compared to first-order expansions. To obtain the accuracy of high-order expansions at lower computational cost, a numerical scheme for variable SHE orders has been developed. Savings of about a factor of five are observed.
Even with the use of adaptive expansion orders, the resulting system of equations still consists of a very high number of unknowns. The solution of the system therefore requires efficient utilization of the available computing resources. Since iterative methods have to be employed for the solution of the system, efficient preconditioners are required to obtain reasonable convergence rates. In recent publications, an Incomplete LU factorization preconditioner with a Threshold (ILUT) has been used. Even though ILUT is very effective in the sense that it reduces the required solver iterations considerably, its main drawback is the serial nature of the algorithm. With the introduction of multi- and many-core architectures in mainstream computers, ILUT becomes increasingly unattractive in terms of efficient use of available computing resources. To overcome this issue, we developed general block-preconditioning schemes for the SHE equations. This allows us to run our simulator using multiple threads on CPUs and to utilize the power Graphics Processing Units (GPUs) using our library ViennaCL. Performance gains of up to one order of magnitude are obtained when using a modern GPU instead of a single CPU core.
|