Next: 4.14 Concluding Remarks Up: 4. The Assembly Module Previous: 4.12 Newton Adjustment

4.13 The Transferred-Transformation Problem

By comparing saved MCSR structures of the same system matrix assembled by the formerly used and the new module, misordering problems have been detected. As a consequence, the new modules were presumed to deliver wrong results and an investigation of the problem was started which finally resulted in an elegant solution for the problem which is directly related to the row transformation.

The row transformation as discussed in Section 4.7.1 is used to supply equations (or rows of the system matrix) with equations having a higher priority. This is necessary, because several segments (with different priorities) could contain the same grid point. Each of the control functions determines a result for this grid point, but only the one with the highest priority should be used.

**Figure 4.6:** Transformations involved if three segments share one physical grid point.
$\includegraphics[width=10.8cm]{figures/transfer2.eps}$

See Figure 4.6 for an illustration of the physical background as it occurs for a MOS structure. The semiconductor segment (segment 1), the gate oxide (segment 2), and the source contact (segment 3) share one physical grid point. Thus, three equations are assembled to obtain the solution for the potential at this point.

Since the contact segment has the highest priority, its solution has to be used by the other two segments. However, during the assembly of the three interface equations, the interface models (which are responsible for setting up the transformation matrix), do not have the full information about all three segments involved. They only take two segments into account, namely the segments the interface is in-between. As a consequence, the following transformation entries can be found in $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ :

Equation 3 belongs to the boundary charge of the source contact (segment 3) and has to be completed by contributions from the neighboring segments.
Equation 115 belongs to the potential quantity of the semiconductor segment and is transferred to equation 3 in order to calculate the boundary charge. Its diagonal entry is zero (a substitute equation is used instead) and the single off-diagonal entry in column 3 equals one. Note, that in $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ the column index is the target of the transformation. This transformation is represented by the black/solid arrow in Figure 4.6.
Equation 2804 belongs to the potential quantity of the gate oxide segment. Since semiconductor segments have a higher priority than oxide segments, a substitute equation is used: $-\psi_{115} + \psi_{2804} = 0$ . While the diagonal entry of row 2804 of $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ is zero, the only off-diagonal entry being non-zero is that of column 115 and equals one. This transformation is represented by the red/dashed arrow in Figure 4.6.

These transformations result in the following equations of the complete linear equation system. Depending whether a correction is activated or not, equation 3 is shown in (4.27) and (4.28), respectively. stand for other contributions equal for both cases and thus not of interest here. The coefficients denote entries wrongly transferred to (4.30) instead of (4.27). With $\psi_{\mathrm{C,3}}$ as the contact potential of segment 3 (source contact), the correct and wrong equation 115 can be seen in (4.29) and (4.30), respectively:

$\displaystyle Q_{3} + C_{\psi/s1} + C_{n/s1} + C_{p/s1} + C_{\psi/s2} + D_1 \, ... ...+ D_2 \, \psi_{2805} + D_3 \, \psi_{2807} = C_{\mathrm{rhs}} + D_{\mathrm{rhs}}$	(4.27)
$\displaystyle Q_{3} + C_{\psi/s1} + C_{n/s1} + C_{p/s1} + C_{\psi/s2} = C_{\mathrm{rhs}}$	(4.28)

$\displaystyle -\psi_{\mathrm{C,3}} + \psi_{115} = 0$	(4.29)
$\displaystyle -\psi_{\mathrm{C,3}} + \psi_{115} + D_1 \, \psi_{2804} + D_2 \, \psi_{2805} + D_3 \, \psi_{2807} = D_{\mathrm{rhs}}$	(4.30)

Table 4.3: Comparison of terminal quantities with and without the correction. Note that the boundary charges are significantly different.

With
Source	$V = 1.0\,$ V	$I =-6.9374202644 \times 10^{-17}\,$ A	$Q =-5.3344762188 \times 10^{-19}\,$ C
Gate	$V = 1.2\,$ V	$I =\phantom{-}0.0000000000 \times 10^{0\phantom{-1}}\,$ A	$Q =\phantom{-}1.1515548905 \times 10^{-15}\,$ C
Drain	$V = 1.2\,$ V	$I =\phantom{-}1.9820343061 \times 10^{-16}\,$ A	$Q =-5.3344761939 \times 10^{-19}\,$ C
Bulk	$V = 0.0\,$ V	$I =-1.2882922796 \times 10^{-16}\,$ A	$Q =-4.0243642506 \times 10^{-20}\,$ C
Without
Source	$V = 1.0\,$ V	$I =-6.9354539375 \times 10^{-17}\,$ A	$Q =-2.6653626903 \times 10^{-16}\,$ C
Gate	$V = 1.2\,$ V	$I =\phantom{-}0.0000000000 \times 10^{0\phantom{-1}}\,$ A	$Q =\phantom{-}1.1515496006 \times 10^{-15}\,$ C
Drain	$V = 1.2\,$ V	$I =\phantom{-}1.9818396270 \times 10^{-16}\,$ A	$Q =-2.6653626903 \times 10^{-16}\,$ C
Bulk	$V = 0.0\,$ V	$I =-1.2882936666 \times 10^{-16}\,$ A	$Q =-4.0243767580 \times 10^{-20}\,$ C

Since the segment equation should also be transferred to the equation of the boundary charge, the red/ dashed transformation is wrong and should be replaced by the green/thick one. As a consequence, the gate oxide is transferring its incomplete equation to the semiconductor segment. Such situations could be prevented if the respective models are provided with the complete information. Since this is not the case for the main simulator MINIMOS-NT and for unstructured meshes an arbitrary number of segments would have to be considered, the assembly module was equipped with an algorithm which corrects the transformation matrix before it is applied. This algorithm has the full information available, but remains deactivated for all simulators which do not require this correction. See Table 4.3 for a comparison of the terminal quantities showing significant differences particularly for the boundary charge.

The reason for the differences between the formerly used and new module can be explained as follows: one important difference between the modules is the assembly sequence. The new assembly module assembles four matrices and compiles them afterwards to the complete linear system (see Section 4.14). This compiling step uses matrix additions and multiplications which operate on the completely assembled structures.

In contrast, the formerly used module calculates all transformations during a symbolic assembly phase in advance and is therefore able to assembly only one system matrix during the actual assembly phase (cf. Newton adjustment). So the matrix additions and multiplications are performed immediately while an entry is added. As analyzed, these transformations do not perform a multiplication in the strict mathematical form, but are already adapted for the requirements described above. This behavior should be demonstrated on a simple mathematical example.

4.13.1 Mathematics

For the demonstration of the matrix transformation, consider a simple $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ matrix multiplied by $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ from the left:

$\displaystyle \ensuremath{\mathbf{A}} = {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}} {\ensuremath{\ensuremath{\mathbf{A}}_{\mathrm{s}}}} \ ,$	(4.31)
$\displaystyle {\ensuremath{\ensuremath{\mathbf{A}}_{\mathrm{s}}}} = \left( \beg... ...}{cccc} 1.0 & & & \\ & 2.0 & & \\ & & 3.0 & \\ & & & 4.0 \end{array} \right)\ .$	(4.32)

By using three different transformation matrices, the actual problem should be made evident. The first transformation matrix and the multiplication result are:

$\displaystyle {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}} = \left( \beg... ...cc} 0.0 & & & 4.0 \\ & 2.0 & & \\ & & 3.0 & \\ & & & 4.0 \end{array} \right)\ .$

(4.33)

The entries of $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ can be interpreted as follows: the column index is interpreted as source, the row index as target of a transformation. The respective equations of $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ are scaled by the value of these positions of $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ . In the example all values in $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ are either one or zero. The entry in row zero and column three is taken as an example: equation three of $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ (the source) is completely (factor one) added to the equation zero of $\mathbf{A}$ . Since in the first column of $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ all entries are zero, the first equation of $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ is never used.

The second transformation matrix and the multiplication result are:

$\displaystyle {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}} = \left( \beg... ...0.0 & & 3.0 & 4.0 \\ & 2.0 & & \\ & & 0.0 & \\ & & & 4.0 \end{array} \right)\ .$

(4.34)

The interpretation is straightforward: Now equation zero is a linear combination of the equations two and three. Row two of $\mathbf{A}$ is zero because there are no non-zero entries in row two of $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ .

The third transformation matrix and the multiplication result are:

$\displaystyle {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}} = \left( \beg... ...0.0 & & 3.0 & \\ & 2.0 & & \\ & & 0.0 & 4.0 \\ & & & 4.0 \end{array} \right)\ .$

(4.35)

Now equation three is transferred to equation two, which in turn is transferred to equation zero. The result of this operation is mathematically correct, but does not meet the requirement of the assembly process. The result should be the same as for the second transformation matrix. If one equation is transferred to an equation which is transferred itself to another target, it must be further transferred to that target. Hence, the problem was called transferred-transformation problem.

4.13.2 The Basic Correction Algorithm

As discussed above, the formerly used module takes such situations into account while calculating all contributions of a matrix entry. However, the new assembly module requires a correction algorithm before the mathematically correct matrix multiplication is processed during the compiling process. For the following reasons, the development of a correction algorithm is a crucial part for the success of the new modules:

Waiving a particular assembly sequence is a very important benefit of the new assembly module. However, this benefit fundamentally depends on the matrix multiplication.
The correction code of the particular matrix can be turned on and off by keeping the original compilation code unmodified. This is advantageous if the (mathematically correct) multiplication algorithm is once replaced by a more optimized version (cf. LAPACK).
A correction code can deliver the required result since the complete information is available.
A correction code can deliver the required result in an efficient way because only few entries are involved and the effort is linear.
A correction code fully fits in the Newton adjustment concept of the new module (see below).
Instead of a correction algorithm, the matrix multiplication itself could be modified in the required manner. This would actually result in two implementations: one mathematically correct and one ``incorrect''. In this case the doubled code in a generally applicable module would have to be justified by a special-purpose requirement.

For these reasons, it was decided to develop a correction algorithm which is presented here. Based on the third example in the last section, the problem is described verbally once again: equation three is transformed to equation two, that is transformed itself to equation zero: thus, a transformation source is also a transformation target. A correction to replace the (three to two to zero) problem by a straightforward (three to zero) transformation is required.

The basic correction algorithm can use all information provided by the transposed MCSR data structure. In $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ , the column indices represent the targets of the transformations.

$\displaystyle {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}}} ... ...0.0 & & & \\ & 1.0 & & \\ 1.0 & & 0.0 & \\ & & 1.0 & 1.0 \end{array} \right)\ .$

(4.36)

Note that the transposed MCSR structure is actually stored in the MCSC format. For simplification, the discussion here still uses the MCSR format. The index and value arrays of the ${\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}}}$ matrix read like this (

stands for unused, see Appendix E.1):

$\begin{displaymath}\begin{array}{rrrrrrrr} \mathrm{pos} & 0 & 1 & 2 & 3 & \phant... ...0 & 1.0 \\ \mathrm{idx} & 5 & 5 & 5 & 6 & 7 & 0 & 2 \end{array}\end{displaymath}$

The algorithm has to focus on the off-diagonal entries in the upper part of the arrays. The column index stands for the target of the transformation. Hence it must be checked if the target row is a source of a transformation which is indicated by off-diagonal entries.

$\displaystyle \mathrm{idx}[\mathrm{idx}[row]] \neq \mathrm{idx}[\mathrm{idx}[row] + 1]\ .$

(4.37)

The correction algorithm loops over all off-diagonal entries, starting at position five in the example. For the first entry the test is negative since

$\displaystyle \mathrm{idx}[\underbrace{\mathrm{idx}[5]}_{0}] = \mathrm{idx}[\underbrace{\mathrm{idx}[5] + 1}_{1}] = 1 \ .$

(4.38)

However, for the second entry the test is positive:

$\displaystyle \overbrace{\mathrm{idx}[\underbrace{\mathrm{idx}[6]}_{2}]}^{5} \neq \overbrace{\mathrm{idx}[\underbrace{\mathrm{idx}[6] + 1}_{3}]}^{6} \ .$

(4.39)

Row two does have off-diagonals, so a transferred transformation is detected. The basic algorithm is able to correct such situations if there is one entry only which is checked by the following condition:

$\displaystyle \mathrm{idx}[\mathrm{idx}[row]] + 1 == \mathrm{idx}[\mathrm{idx}[row] + 1]\ .$

(4.40)

More than one entry (as required in cases where more than three segments share one grid point) would require a restructuring of the complete MCSR array or alternative approaches as discussed in the next section. If there is only one entry, the correction is just a simple assignment:

$\displaystyle \mathrm{idx}[row] = \mathrm{idx}[\mathrm{idx}[\mathrm{idx}[row]]]\ ,$	(4.41)
$\displaystyle \mathrm{idx}[6] = \mathrm{idx}[\underbrace{\mathrm{idx}[\overbrace{\mathrm{idx}[6]}^{2}]}_{5}] = 0\ .$	(4.42)

After the correction is completed, ${\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}}}$ has to be transposed back resulting in the same transformation matrix as used in (4.34):

$\displaystyle {\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}} = \left( \beg... ...0.0 & & 1.0 & 1.0 \\ & 1.0 & & \\ & & 0.0 & \\ & & & 1.0 \end{array} \right)\ .$

(4.43)

4.13.3 The Advanced Algorithm

The basic algorithm explained above could not be used for more complicated structures, for example the ones which contain grid points calculated by four control functions.

In addition, duplicate entries in the resulting MCSR structure (as they could occur using the basic algorithm) should be avoided. In contrast to the full format, a sparse format stores only entries which are intended to be non-zero. Besides the actual value, also the row and column number have to be stored. Thus, it is possible to store one matrix entry more than one time, which is a duplicate entry of the same position. Most of the mathematical operations defined for MCSR take those implicitly into account, since they simply process all entries in the structure. However, due to the fact that all multiple entries should be actually summed up to one entry, numerical inaccuracies may occur. So duplicate entries are best avoided at all.

The main objective of the advanced algorithm should provide a generally applicable correction of the transformation matrix while avoiding duplicate entries. A new example should demonstrate the actual problem. In a transposed transformation matrix $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ the entries (row:34, col:3) and (47, 34) are set to one. The basic algorithm corrects the latter entry to (47, 3). Since there is only one entry, this correction is successful (case one).

To extend the example, an additional entry in (78, 34) is supposed to be one. That means, that equation 34 is not only transferred to equation 3, but also to equation 78. Note that in $\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$ the column index stands for the target of a transformation. In that case, the basic algorithm fails, because there is not enough space to add new entries (case two). Both cases are graphically represented in Figure 4.7.

**Figure 4.7:** Graphical representation of the multiple transfer problem.
$\includegraphics[width=10.8cm]{figures/trfgraph.eps}$

The first case is shown in Figure 4.7a and Figure 4.7b. In Figure 4.7a, two arrows represent two transformations, the blue/dashed arrow has to be redirected to the position the black/solid arrow points to. In Figure 4.7b, the green/dashed arrow represents the redirection. After the correction the number of arrows stays the same.

The second case is shown in Figure 4.7c and Figure 4.7d. Since the center position is transferred to multiple (here two, generally ) right positions, there are also arrows needed to point from the left position to all right positions. The number of arrows increases by . In Figure 4.7d, the red/dotted arrow has to be created additionally. In the existing MCSR structure, there is no space for this entry. Hence, no thorough correction of the second case could be made.

If equation 47 of $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ contains the non-zero column entries 49, 51, and 55, the following misorderings are the result of the omitted correction:

Equation 3: entries in columns 49, 51, and 55 are missing.
Equation 34: entries in columns 49, 51, and 55 should not be there.
Equation 78: entries in columns 49, 51, and 55 are missing.

For that reason new entries have to be added in order to completely correct the transformation matrix. There are two solutions for this problem:

The first approach requires an enlargement and reordering algorithm for the existing arrays. This algorithm must take the requirement into account, that duplicate entries should be avoided. Thus, the resulting algorithm must count all additional needed entries actually required, allocate a new structure, copy all entries and eventually dismiss the old structure. However, all of these parts are already implemented, which motivates the second approach.
The flexible sparse structure of the transformation matrix provides already many of these required features: it can be used to add new entries easily and it avoids duplicate entries. So the second approach yields an algorithm that reuses and iteratively improves the still existing flexible sparse structure of the transformation matrix.

By pursuing the second approach, an algorithm was developed which benefits from already existing and applied implementations. The system can now take all possible situations into account since it does not limit the number of commonly used grid points (independent of the number of spatial dimensions) any more. Table 4.4 summarizes the initial situation and the correction result for both the transposed and untransposed transformation matrix.

Table 4.4: The transposed and untransposed transformation matrix before and after the correction. The column index stands for the source or the target of the transformation, respectively.

Before:

$\ensuremath{\mathbf{T}}_{\mathrm{b}}$	3	34	47	78
3	1.0	1.0
34			1.0
47
78		1.0		1.0

$\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$	3	34	47	78
3	1.0
34	1.0			1.0
47		1.0
78				1.0

After:

$\ensuremath{\mathbf{T}}_{\mathrm{b}}$	3	34	47	78
3	1.0	1.0	1.0
34
47
78		1.0	1.0	1.0

$\ensuremath{\mathbf{T}}_{\mathrm{b}}^{\mathrm{T}}$	3	34	47	78
3	1.0
34	1.0			1.0
47	1.0			1.0
78				1.0

Regarding the Newton adjustment it is important to note, that the advanced algorithm is fully employable due to the following considerations. After a successful first Newton adjustment level, no flexible sparse structure exists which could be used to add additional entries. However, the already existing $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ matrix does already contain all required entries. This assumption holds since all deterministic algorithms always yield the same result for the same inputs. Therefore, no Newton adjustment errors will occur.

Next: 4.14 Concluding Remarks Up: 4. The Assembly Module Previous: 4.12 Newton Adjustment

S. Wagner: Small-Signal Device and Circuit Simulation