Next: 4.10 Scaling the Inner Up: 4. The Assembly Module Previous: 4.8 The Pre-Elimination

4.9 Sorting the Inner Linear Equation System

Matrices arising from the discretization of differential operators are sparse, because only neighbor points are considered. For that reason, only the non-zero elements are stored in order to reduce the memory consumption (see the MCSR format in Appendix E.1). However, during the factorization of the system matrix $\ensuremath{\mathbf{A}}$ into an upper and lower triangular matrix $\ensuremath{\mathbf{A}} =\ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}}$ , additional matrix elements termed fill-in [193] become non-zero. The profile $p(\ensuremath{\mathbf{A}})$ is a measure for this fill-in

$\displaystyle p(\ensuremath{\mathbf{A}}) = \sum_{i=1}^{n} m_i\ ,$	(4.22)
$\displaystyle m_i = i - \min_{a_{i,j} \neq 0}(j)\ ,$	(4.23)

and the bandwidth of the matrix is $\max_i(m_i)$ . Thus, the bandwidth of the system matrix is the maximum distance between a diagonal and an off-diagonal entry of the same row. Since storing of $p(\ensuremath{\mathbf{A}})$ requires additional memory, a transformation is applied to reduce the bandwidth and the profile. Thus, sorting algorithms sort the rows and columns of the system matrix in such a way that the elimination performed by a Gaussian type factorization yields a small number of fill-ins. The term reordering can be used instead of sorting.

The standard module provided by default obtains the sorting matrix $\ensuremath{\mathbf{R}}_{\mathrm{s}}$ (similar to $\ensuremath{\mathbf{P}}$ ) by a reverse Cuthill-McKee algorithm [193,43]. It contains a single unity entry in each row and is applied in such a way that rows and columns are equally swapped in order to keep the diagonal dominance. In Figure 4.3 the reordered inner system matrix with a significantly reduced bandwidth is shown. With $\ensuremath{\mathbf{A}}_i \ensuremath{\mathbf{x}}_i = \ensuremath{\mathbf{b}}_i$ as the inner system and $\ensuremath{\ensuremath{\mathbf{R}}_{\mathrm{s}}^{\mathrm{T}}}$ equal $\ensuremath{\ensuremath{\mathbf{R}}_{\mathrm{s}}^{-1}}$ , sorting can be written as follows:

$\displaystyle ({\ensuremath{\ensuremath{\mathbf{R}}_{\mathrm{s}}^{\mathrm{T}}}}... ...uremath{\mathbf{R}}_{\mathrm{s}}^{\mathrm{T}}}} {\ensuremath{\mathbf{b}}_i})\ .$

(4.24)

The effort of the sorting algorithms as well as the effort for evaluating the required storage (cf. symbolic phase) is $O(\ensuremath{n_{\mathrm{non-zero}}})$ each, with $n_{\mathrm{non-zero}}$ as the numbers of non-zeros. By defining an average bandwidth $b_{\mathrm{avg}}$ as the average line length in the $\mathbf{L}$ part or column depth in the $\mathbf{U}$ part of the matrix, one can roughly estimate the space consumption as $O(n b_{\mathrm{avg}})$ and the time consumption as $O(n/2 b_{\mathrm{avg}}^2)$ [65].

As alternative to the in-house implementation of the reordering algorithm, external packages can be employed. For example, the Boost++ [27] (see Section 5.1.5) libraries provide a graph package with respective algorithms. In [193], the Gibbs-Poole-Stockmeyer algorithm [75] is suggested as an efficient alternative to the Cuthill-McKee-based algorithms. Further alternatives are the minimum degree [72] or nested dissection [71] algorithms.

In [44], a column approximate minimum degree ordering algorithm is presented. Basically, sparse Gaussian elimination with partial pivoting computes the factorization $\ensuremath{\mathbf{P}} \ensuremath{\mathbf{A}} \ensuremath{\mathbf{Q}} = \ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}}$ . While the row ordering $\mathbf{P}$ is generated during factorization, the column ordering $\mathbf{Q}$ is used to limit the fill-in by considering the non-zero pattern in $\mathbf{A}$ . A conventional minimum degree ordering requires the sparsity structure of $\ensuremath{\mathbf{A}} \ensuremath{\mathbf{P}}^\mathrm{T}$ . Since the computation can be expensive, alternative approaches are based on the sparsity structure of $\ensuremath{\mathbf{A}}$ instead.

In [183], an introduction is given to find unsymmetric permutations which try to maximize the elements on the diagonal of the matrix. Since matrices with zero diagonal entries cause problems for both direct and iterative solving techniques, the rows and columns are permuted in order that only non-zero elements remain on the diagonal only. Due to performance considerations and the applicable diagonal strategy during factorization (see Section 4.4), the assembly module does not provide such a feature. As a consequence, the simulator is obliged to avoid zero diagonal entries, which can be done in all cases of interest.

Next: 4.10 Scaling the Inner Up: 4. The Assembly Module Previous: 4.8 The Pre-Elimination

S. Wagner: Small-Signal Device and Circuit Simulation