4 Structural Properties

Chapter 4
Structural Properties

The SHE equations furnish a number of interesting structural properties, which are the topic of this chapter. All properties are discussed on the continuous level and result from certain symmetries of the underlying physical processes. Consequently, no additional assumptions about a particular discretization method need to be imposed and the same structural properties hold true for the system of discretized linear equations as outlined in Chap. 5.

In Sec. 4.1 the coupling between the SHE equations is investigated and a scheme for the lossless compression of the system matrix is proposed, which greatly reduces the memory consumption of the SHE method at high expansion orders. The different boundary conditions used for the SHE method are discussed in Sec. 4.4 and extensions to the system matrix compression scheme are proposed in order to handle them. This chapter closes with some numerical results demonstrating the benefits of the proposed compression scheme.

4.1 Sparse Coupling for Spherical Energy Bands

As already noted in Sec. 2.3, the terms $′ ′ [Y l,m ]l,m$ , $′ ′ jll,,mm$ and $′ ′ Γ ll,,mm$ determine the coupling among the projected equations (2.30). If all coupling coefficients were multiples of the Kronecker delta, then the SHE equations were decoupled and could be solved individually. On the contrary, if all coupling coefficients were nonzero, all projected equations were coupled with each other. In analogy to systems of linear equations described by matrices, one is typically interested in a weak coupling, which typically accelerates the solution process, and on the other hand leads to lower memory requirements. A closer inspection of the coupling structure for spherically symmetric dispersion relations is carried out in this section.

For general band structures, the symmetry of the underlying processes leads to the following result: [53]

Theorem 1 (Jungemann et. al.). For a spherical harmonics expansion up to order $L = 2I + 1$ with $I ∈ ℕ$ , there holds for all $i,i′ ∈ {0,...,I}$ , $m ∈ {− i,...,i}$ and $m ′ ∈ {− i′,...,i′}$

j_2i,m^{2i^′,m^′} = j_2i+1,m^{2i^′+1,m^′} = 0, Γ_2i,m^{2i^′,m^′} = Γ_2i+1,m^{2i^′+1,m^′} = 0 .

The essence of this theorem is that all nonzero coupling coefficients possess different parities in the leading indices. This small structural information about the coupling was already used for a preprocessing step for the solution of the discretized equations in [53].

Under the assumption of spherical energy bands, i.e. $ε(k) = ε(|k|)$ , the velocity $v$ , the modulus of the wave vector $|k |$ and the generalized density of states only depend on the energy $ε$ , but not on the angles $θ,φ$ . Consequently, (2.31), (2.32) and (2.31) are rewritten as

l′,m ′ l′,m ′ [Y ]l,m = δl,m∫ , (4.1) l′,m ′ l′,m ′ jl,m = vZ Yl,meεYl′,m′dΩ =: vZa l,m , (4.2) ′ ′ Z ∫ ( ∂Y 1 ∂Y ) Z ′ ′ Γ ll,,mm = ---- --l,m-eθ + -------l,me φ Yl′,m′dΩ =: ----bll,,mm , (4.3) ℏ|k| ∂ θ sin θ ∂ φ ℏ|k|

where $δl′,m′ l,m$ denotes the Kronecker delta and $v = veε$ . A similar decomposition is possible with the full-band modification (3.20).

The coupling between index pairs $(l,m )$ and $′ ′ (l,m )$ is determined by the integral terms $l′,m′ a l,m$ and $l′,m′ bl,m$ only. As shown by the author, it turns out that the coupling is rather weak:

Theorem 2. For spherical energy bands and indices $′ l,l ∈ {0,...,L}$ , $m ∈ { − l,...,l}$ and $m ′ ∈ {− l′,...,l′}$ , there holds:

If $l′,m ′ jl,m$ is nonzero, then $l ∈ {l′ ± 1}$ and $m ∈ {±|m ′|± 1,m ′}$ .
If $Γ l′,m′ l,m$ is nonzero, then $l ∈ {l′ ± 1}$ and $m ∈ {± |m ′|± 1,m′}$ .

A rather lengthy and technical proof is given in [87] and makes use of recurrence relations and orthogonalities of trigonometric functions and associated Legendre functions.

Thm. 2 is very important for large expansion orders $L$ : The total number of unknown expansion coefficients is $(L + 1)2$ , but according to the result of the theorem, each $f l,m$ is directly coupled with at most ten other coefficients. It should also be noted that the weak coupling stated in Thm. 2 has already been observed for less general situations in earlier publications [78, 37].

Next, the structure of the SHE equations (2.30) for spherical energy bands is investigated in depth. Unlike the presentation in [87], the structure is investigated already at the continuous level, which has the advantage of covering a wider class of discretization schemes and less notational clutter. Due to the equivalence of (2.30) and (2.38) for spherical energy bands, the results in this section apply equally to an expansion of the distribution function $f$ and to an expansion of the generalized distribution function $g$ .

A motivation for a first-order expansion is now given, then the scheme is extended to general expansion orders. Ignoring the stabilization using MEDS in a first step, the four SHE equations using Einstein summation convention read

+ vZa_0,0^{l^′,m^′}⋅∇_xf_l^′,m^′ + b_0,0^{l^′,m^′}⋅F + ∑ _ηf_l^′,m^′	= 0 ,
+ vZa_1,−1^{l^′,m^′}⋅∇_xf_l^′,m^′ + b_1,−1^{l^′,m^′}⋅F + ∑ _ηf_l^′,m^′	= 0 ,
+ vZa_1,0^{l^′,m^′}⋅∇_xf_l^′,m^′ + b_1,0^{l^′,m^′}⋅F + ∑ _ηf_l^′,m^′	= 0 ,
+ vZa_1,1^{l^′,m^′}⋅∇_xf_l^′,m^′ + b_1,1^{l^′,m^′}⋅F + ∑ _ηf_l^′,m^′	= 0 .

The terms $l′,m′;out out ˜sl,m sη$ and $l′,m′;in in ˜sl,m sη$ may be operators in order to take shifts of the arguments of the distribution function into account and denote the angular coupling of the scattering operator and reduce to Kronecker deltas when scattering processes are modeled as velocity randomizing, cf. Sec. 2.2.

This continuous linear system of partial differential equations can be written in operator matrix form as

⌊ ( ) ( 1 0 0 0 ) a0,0 a1,− 1 a1,0 a1,1 | | | | 00,,00 01,0,− 1 01,,00 10,,01 | || ∂-|( 0 1 0 0 |) + vZ ∇x ⋅|| a1,0−,01 a11,−,− 11 a 11,−,10 a1,1−,11 || ⌈ ∂t 0 0 1 0 ( a1,0 a1,0 a1,0 a1,0 ) 0 0 0 1 a01,0,1 a11,−,1 1 a11,,01 a11,1,1 ( 0,0 1,−1 1,0 1,1 ) | b00,,00 b01,,0−1 b01,,00 b10,1,0 | ZF-- | b1,−1 b1,−1 b1,−1 b1,− 1 | + ℏ|k | ⋅|( b0,1,00 b11,,−01 b11,,00 b11,1,0 |) b0,0 b1,−1 b1,0 b1,1 ( 1,1 1,1 1,1 1,1 ) ˜s00,,0;0out ˜s10,−,0 1;out ˜s10,,0;0out ˜s10,1,0;out ∑ || ˜s0,0;out ˜s1,− 1;out ˜s1,0;out ˜s1,1;out || + soηut| 10,,−0;1out 11,−,− 11;out 11,,−0;1out 11,−,1;1out | η ( ˜s10,,00;out ˜s11,0,− 1;out ˜s11,,00;out ˜s11,0,1;out ) ˜s1,1 ˜s1,1 ˜s1,1 ˜s1,1 ( ˜s0,0;in s˜1,−1;in ˜s1,0;in ˜s1,1;in ) ⌋ ( ) ∑ | 0,0,00;in 01,,0−1;in 0,1,00;in 01,,01;in | | f0,0 − sin || ˜s1,−1 s˜1,−1 ˜s1,−1 ˜s1,−1 || || × || f1,−1 || = 0 , η η ( ˜s0,1,00;in s˜11,,−01;in ˜s1,1,00;in ˜s11,,1;0in ) ⌉ ( f1,0 ) ˜s0,0;in s˜1,−1;in ˜s1,0;in ˜s1,1;in f1,1 1,1 1,1 1,1 1,1

(4.4)

where the gradient and the time derivative act as an operator on the expansion coefficients, not on the coefficient matrices. The crucial observation now is that all entries in each of the matrices are independent of the spatial variable $x$ as well as of total energy $H$ . Writing $F(x )$ and $x$ in components,

F(x) =

,x =

the considered first-order system can thus be written as

⌊ ⌋ ( ) ∑3 ∑3 ∑ ∑ | f0,0| ⌈ ∂-I + vZ -∂--Ap + ZFp-Bp + soηutSout − sinη Sin⌉ | f1,−1| = 0 , (4.5) ∂t p=1 ∂xp p=1 ℏ|k| η η ( f1,0) f1,1

where the matrices $I$ , $Ap$ , $Bp$ , $Sout$ and $Sin$ are given by the respective matrices in (4.4).

For the case of a general expansion order $L$ , an enumeration of the index pairs $(l,m)$ is required. Let $κ$ denote such a mapping, for instance $2 κ (l,m ) = l + l + m + 1$ . Repeating the calculation for the general case, one obtains that the operator matrix $S$ for the SHE equations can be written as

∑11 S = qi(x,H )Ri . (4.6) i=1

The spatial terms $q1,...,q11$ are given by

q(x,H ) = -∂ , (4.7) 1 ∂t --∂--- qp(x,H ) = v∂xp −1, p = 2,3,4 , (4.8) 1 qp(x,H ) = − Fp−4---- , p = 5,6,7 , (4.9) ℏ|k| q(x,H ) = -1--∑ σ (x, H,H + ℏω )Z Z (H + ℏω ) , (4.10) 8 Y0,0 η η 0,0 0,0 η ∑η q9(x,H ) = -1-- ση(x, H,H − ℏωη)Z0,0Z0,0(H − ℏωη) , (4.11) Y0,0 η 1 ∑ q10(x,H ) = − ---- ση(x,H + ℏωη,H )Z0,0(H + ℏωη)Z0,0Dℏωη , (4.12) Y0,0 η 1 ∑ q11(x,H ) = − ---- ση(x,H − ℏωη,H )Z0,0(H − ℏωη)Z0,0D−ℏωη , (4.13) Y0,0 η

with the energy shift operator $Dy{f}(x,H ) := f (x,H + y)$ . Note that some of the terms $q2,...,q7$ evaluate to zero if one- or two-dimensional simulations are carried out. The angular coupling matrices $R1,...,R11$ are determined by

( ) R1 κ(l,m),κ(l′,m ′) = δl,l′δm,m′ , (4.14) ( ) l′,m′ Rp κ(l,m),κ(l′,m ′) = (al,m )p−1 , p = 2,3,4 , (4.15) (R ) = (bl′,m ′) , p = 5,6,7 , (4.16) ( p)κ(l,m),κ(l′,m ′) l,m p−4 R8 κ(l,m),κ(l′,m ′) = δl,l′δm,m′ , (4.17) ( ) R9 κ(l,m),κ(l′,m ′) = δl,l′δm,m′ , (4.18) (R ) = δ ′δ ′δ′ δ ′ , (4.19) ( 10)κ(l,m),κ(l′,m ′) l,l m,m l,0 m ,0 R11 κ(l,m),κ(l′,m ′) = δl,l′δm,m′δl′,0δm′,0 . (4.20)

Hence, the full continuous system of equations $S$ can be composed out of continuous spatial terms $qp$ and constant angular coupling matrices $Rp$ , $p = 1,...,11$ . It is worthwhile to mention that each of the coefficient pair $(q8,q9)$ can be replaced by a single coefficient, and similarly for $(q10,q11)$ if only elastic scattering is considered. In analogy, each of the pairs $(R ,R ) 8 9$ and $(R ,R ) 10 11$ can be replaced by a single matrix [87].

The decomposition of the continuous case is preserved after spatial discretization. To see this, let $Q1, ...,Q11$ denote the matrices arising from an arbitrary discretization of the terms $q1,...,q11$ . In most cases, a finite volume or a finite element method will be employed, but also less widespread techniques such as wavelet methods could be used. Then, the discrete system matrix $S h$ for the SHE equations can be written as

11 ∑ Sh = Qi ⊗ Ri , (4.21) i=1

where $⊗$ denotes the Kronecker product (cf. A.1 for the definition). After a suitable rearrangement of unknowns, the system matrix could also be written in the equivalent form

1∑1 S˜h = Ri ⊗ Qi . (4.22) i=1

The advantage of a representation using sums of Kronecker products is the considerably smaller memory required for the factors. For a SHE of order $L$ , the matrices $Ri$ are of size $(L + 1)2 × (L + 1)2$ and sparse according to Sec. 4.1. For a spatial discretization using $N$ unknowns, the matrices $Q i$ are of dimension $N × N$ and typically sparse, hence the full system matrix $Sh$ is of dimension $2 2 N (L + 1) × N (L + 1)$ . While the explicit storage of a sparse system $Sh$ thus requires $2 2 O (N (L + 1) ) = O(N L )$ memory, a storage of the Kronecker product factors requires $O (N + L2)$ memory only, which is a considerable difference already for $L = 5$ or $L = 7$ .

In light of the memory requirements for the system matrix it is also worthwhile to point out the importance of Thm. 2: Without the sparsity of coupling coefficients and without the use of a Kronecker representation, $O(N (L + 1)4) = O (N L4)$ memory is required. With the sparsity of coupling coefficients, only $2 O (N L )$ memory is required for a full representation of $Sh$ , which is further reduced to $2 O(N + L )$ when using a Kronecker product representation. Since typical values of $L$ are in the range three to seven, the memory savings due to Thm. 2 combined with (4.21) can be orders of magnitude.

4.2 Coupling for Nonspherical Energy Bands

The matrix compression described in the previous section relies on the factorizations (4.2) and (4.3) of the coupling terms $l′,m′ jl,m$ and $l′,m′ Γl,m$ , such that the factors depend on either energy (and possibly the spatial location) or on the indices $l$ , $m$ , $′ l$ and $′ m$ , but not on both. Moreover, the derivation requires that the density of states does not show an angular dependence. However, in the case of nonspherical energy bands, these three terms depend on energy and on angles.

In order to decouple the radial (energy) contributions from the angular ones for nonspherical energy band models, a spherical projection up to order $′ L$ of the coupling terms can be performed by approximating [53]

L∑′ ∑l′′ ′′ ′′ v (ε,θ,φ ) ≈ ˜vl′′,m′′(ε)Y l,m (θ,φ) , (4.23) l′′=0m ′′=−l′′ L∑′ ∑l′′ -----1-----≈ ˜Γ l′′,m′′(ε)Y l′′,m′′(θ,φ) , (4.24) ℏ|k(ε,θ,φ)| l′′=0m ′′=−l′′ ′ ′′ L∑ ∑l ˜ l′′,m′′ Z (ε,θ,φ ) ≈ Zl′′,m ′′(ε)Y (θ,φ) , (4.25) l′′=0m ′′=−l′′

where the expansion coefficients are given for $ε > 0$ by

_{l^′′,m^′′}(ε)	= ∫ _Ωv(ε,θ,φ)Y ^{l^′′,m^′′}(θ,φ)dΩ ,
_{l^′′,m^′′}(ε)	= ∫ _ΩY ^{l^′′,m^′′}(θ,φ)dΩ ,
_{l^′′,m^′′}(ε)	= ∫ _ΩZ(ε,θ,φ)Y ^{l^′′,m^′′}(θ,φ)dΩ .

For simplicity, the expansion order $L ′$ , which depends on the complexity of the band structure, is taken to be the same for $v$ , $−1 (ℏ|k|)$ and $Z$ . In a slightly different context, values of $′ L = 4$ have been used in [73] and [69] for an expansion of $k$ , and good accuracy has been obtained. On the other hand, a spherically symmetric velocity profile is exactly recovered by an expansion order $L ′ = 1$ .

The expansion order of the generalized density of states $Z$ is implicitly coupled to the expansion order $L$ of the distribution function by the scattering operator (2.23). Thus, even if $Z$ is expanded up to order $′ L > L$ , only expansion terms up to order $L$ can be considered. For this reason $′ L ≤ L$ is assumed in the following.

Substitution of the expansions (4.23) and (4.24) into (4.2) and (4.3) yields

j_l,m^{l^′,m^′}	= _{l^′′,m^′′}(ε)∫ _ΩY ^l,mY ^{l^′,m^′}Y ^{l^′′,m^′′}dΩ =: v_{l^′′,m^′′}(ε)a_l,m^{l^′,m^′;l^′′,m^′′} ,
Γ_l,m^{l^′,m^′}	= _{l^′′,m^′′}(ε)∫ _Ωe_θ + e_φY ^{l^′,m^′}Y ^{l^′′,m^′′}dΩ =: Γ_{l^′′,m^′′}(ε)b_l,m^{l^′,m^′;l^′′,m^′′} ,

so that in both cases a sum of $(L ′ + 1)2$ decoupled terms is obtained. Note that in the case of spherical energy bands, the sum degenerates to a single term. Repeating the steps from the previous section, the continuous system of partial differential equations $S$ can be written similar to (4.6) in the form

′ 2 1+10∑(L+1) Sh = qi(x, H )Ri , (4.26) i=1

while the discrete system matrix $Sh$ can be written in analogy to (4.21) as

1+10(∑L′+1)2 Sh = Qi ⊗ Ri . (4.27) i=1

The entries of the coupling matrices due to $al′,m′′′ ′′ l,m;l ,m$ are directly obtained from the Wigner 3jm-symbols, cf. A.2. The sparsity of the coupling matrices due to $l′,m ′ bl,m;l′′,m′′$ is not clear at present, but a similar structure is expected. Assuming for simplicity dense spherical harmonics coupling matrices, the system matrix can be stored using at most $O(L ′2(L4 + N ))$ memory, where in typical situations $L4 ≪ N$ .

4.3 Stabilization Schemes

The stabilization schemes discussed in Sec. 2.3 lead to different equations for even-order and odd-order projected equations. Since MEDS basically exchanges the summation index and the sign of the coupling term $l′,m′ Γ l,m$ , the representations (4.6) and (4.26) can extended to

∑p e o S = qi(x,H )(R i + R i) , (4.28) i=1

where $p$ denotes the number of matrices. $e R i$ refers to the even-order equations and is obtained from $Ri$ by setting all rows $κ(2l + 1,m )$ to zero. $Roi$ is obtained from $Ri$ by transposition and zeroing all rows $κ(2l,m )$ of the transposed matrix. In addition, signs are swapped in terms affected by MEDS.

For the discretization it is assumed that even-order harmonics can be discretized in a different manner than odd-order harmonics. If all even-order unknowns are enumerated first, the system matrix $Sh$ has a block-structure of the form

( ) p ( ) Sehe Seho ∑ Qeei ⊗ Reei Qeio ⊗ Reio Sh = Sohe Soho = Qoei ⊗ Roei Qoio ⊗ Roio . (4.29) i=1

The even-to-even coupling matrix $ee S h$ and the odd-to-odd coupling matrix $oo Sh$ are square matrices and determined according to Thm. 1 or Thm. 2 only by the projected time derivative $∂[f]l,m∕∂t$ and the projected scattering operator $Ql,m{f }$ . The even-to-odd coupling matrix $Seoh$ is not necessarily square and determined by the free-streaming operator with sparsity pattern given by Thm. 2. The odd-to-even coupling matrix $Soeh$ is rectangular in general and determined by the free-streaming operator and for nonspherical bands also by the scattering operator $Q {f} l,m$ , cf. (2.23).

Since the coupling structure of the scattering operator is explicitly given by (2.23), the structure of $ee Sh$ and $oo Sh$ is as follows:

Theorem 3. The following statements hold true for a system matrix (4.29) obtained from a discretization of the stabilized SHE equations in steady-state with velocity-randomizing scattering processes only:

The matrix $Sooh$ is diagonal.
For spherical energy bands without considering inelastic scattering, $Seeh$ is diagonal.

This structural information is very important for the construction of solution schemes in Sec. 4.5.

4.4 Boundary Conditions

Only the discretization of the resulting system of partial differential equations in the interior of the simulation domain has been considered in an abstract fashion so far. At the boundary, suitable conditions need to be imposed and incorporated into the proposed compression scheme.

At all noncontact boundaries, homogeneous Neumann boundary conditions are imposed [78, 53, 42], which can be directly included in the proposed compression scheme, because no additional boundary terms appear. At the contact boundaries, two different types of boundary conditions are typically imposed. The first type are Dirichlet boundary conditions [78], where the distribution function is set to a Maxwell distribution. Hence, the expansion coefficient $f0,0$ is set according to (2.1), while $f l,m$ is set to zero for $(l,m ) ⁄= (0,0)$ . This it either enforced by replacing the corresponding matrix row with unity in the diagonal and zeros elsewhere as well as setting the appropriate value at the right hand side vector, or by directly absorbing the known values to the load vector. For the proposed compression scheme, the second way is of advantage, because in that case boundary conditions do not alter the matrix structure.

The second type of contact boundary conditions is a Robin-type generation/recombination process [53]

γ_l,m(ε) = −

Z (ε)f (ε) − Z (ε)f eq(ε)
-l,m----l,m-------l,m----------
τ0

where $f eq$ denotes the equilibrium Maxwell distribution, or, similar in structure, a surface generation process of the form [42]

Γ^s = [f^eq(k^′)ϑ(v ⋅n) + f(k^′)ϑ(−v ⋅n)]v ⋅n ,

where $ϑ$ denotes the step function and $n$ is the unit surface normal vector pointing into the device. This type of boundary conditions leads to additional entries in the system matrix due to the surface boundary integrals. Consequently, the scheme needs to be adjusted and written in the form

Sh = Sihnner+ Schontact, (4.30)

where $Sinhner$ contains the discretized equations for all interior points as given by (4.21), (4.27) or (4.29), and $Scontact h$ consists of the discretized contact boundary terms. Since the number of contact boundary points is much smaller than the total number of grid points $N$ , the sparse matrix $contact S h$ can be set up without compression scheme and the additional memory requirements are negligible.

As can be easily verified, $Schontact$ can also be written as a sum of Kronecker products. Consequently, the following discussions are based on the case of Dirichlet boundary conditions, but are also applicable to the more general case including Robin-type boundary conditions, which only add additional summands.

4.5 Solution of the Linear System

The system matrix representation introduced in the previous sections is of use only if the resulting scheme can be solved without recovering the full matrix again. Such a reconstruction is, in principle, necessary if direct solvers such as the Gauss algorithm are used, because the matrix structure is altered in a way that destroys the block structure. In contrast, for many popular iterative solvers from the family of Krylov methods, it is usually sufficient to provide matrix-vector multiplications. Therefore, methods to compute the matrix-vector product $Shx$ for a given vector $x$ are discuss first.

Given a system matrix of the form (4.21) or (4.27), it is well known that a row-by-row reconstruction of the compressed matrix $Sh$ for the matrix-vector product is not efficient. Therefore, the vector $x$ is decomposed into $N$ blocks of size $(L + 1)2$ by

( x ) | 1.| ∑N x = ( ..) = ej ⊗ xj , (4.31) xN j=1

where $ej$ is the $j$ -th column vector of the identity matrix. The matrix-vector product can now be written as

p N [∑ ][∑ ] Sx = Qi ⊗ Ri ej ⊗ xj (4.32) i=1 j=1 ∑p ∑N = (Qiej) ⊗ (Rixj) . (4.33) i=1j=1

Here, the product $Qiej$ is simply the $j$ -th column of $Qi$ .

In the case of spherical energy bands, it can be shown that the matrix-vector multiplication requires slightly less computational effort than the uncompressed case [87]. As discussed in Sec. 4.2, nonspherical bands lead to a larger number of summands, thus leading to a higher computational effort for the matrix-vector multiplications compared to the uncompressed case. Nevertheless, the additional computational effort is increased only moderately, while the memory requirements can be reduced significantly.

Recent publications report the elimination of odd order unknowns in a preprocessing step [53, 42]. Moreover, it has been shown that for a first-order expansion the system matrix after elimination of odd order unknowns is an $M$ -matrix [42]. Furthermore, numerical experiments indicate a considerable improvement in the convergence of iterative solvers. However, for a matrix structure as given by (4.29), a direct elimination of odd order unknowns would destroy the representation of the system matrix $Sh$ as a sum of Kronecker products. Writing the system as

( See Seo )( fe ) ( re ) Shf = hoe hoo o = o (4.34) S h S h f r

with the vector of unknowns $f$ split into $f e$ and $fe$ as unknowns associated with even and odd order harmonics respectively and analogously for the right hand side vector $r$ , the elimination of odd order unknowns is carried out using the Schur complement:

ee eo oo −1 oe e e eo oo −1 o (Sh − Sh (Sh ) Sh )f = r − S h (S h ) r . (4.35)

According to Thm. 3, $oo S h$ is a diagonal matrix, thus the inverse is directly available. The other matrix-vector products are carried out as discussed in the beginning of this section. Note that the matrix $ee eo oo −1 oe S h − Sh (Sh ) S h$ is never set up explicitly.

In contrast to a matrix-vector multiplication with the full system matrix $Sh$ , where the proposed matrix compression scheme requires approximately the same computational effort, a matrix-vector multiplication with the condensed matrix $(Sehe− Seoh (Sooh )− 1Sohe)$ is more expensive than a matrix-vector multiplication with a fully set up condensed matrix. The penalty is quantified in Sec. 4.6.

With a system matrix representation of the form (4.21), the total memory needed for the SHE equations is essentially given by the memory required for the unknowns, which adds another perspective on the selection of the iterative solver. Since $′ ′ Γ ll,,mm ⁄= Γ ll,m′,m′$ , the system matrix $Sh$ is not symmetric. Moreover, numerical experiments indicate that the matrix $Sh$ is indefinite, thus many popular solvers cannot be used. A popular solver for indefinite problems is GMRES [89, 112], which is typically restarted after, say, $s$ steps and denoted by GMRES( $s$ ). Since GMRES has been used in recent publications on SHE simulations [53, 42], it deserves a closer inspection. For a system with $′ N$ unknowns, the memory required by GMRES( $s$ ) during the solution process is $′ O(sN )$ . In typical applications, in which the system matrix is uncompressed, this additional memory approximately equals the amount of memory needed for the storage of the system matrix, hence it is not a major concern. However, with the system matrix representation (4.6) the memory needed for the unknowns is dominant, thus the additional memory for GMRES( $s$ ) directly pushes the overall memory requirements from $O (N L2)$ to $O (sN L2)$ . The number of steps $s$ is typically chosen between $20$ and $30$ as smaller values may lead to smaller convergence rates or the solver may even fail to converge within a reasonable number of iterations. Hence, it is concluded that GMRES( $s$ ) is too expensive for SHE simulations when using the representation (4.6). Instead, iterative solvers with smaller memory consumption such as BiCGStab [102] should be used.

4.6 Results

In this section execution times and memory requirements of the Kronecker product representation (4.6) using a single core of a machine with a Core 2 Quad 9550 CPU are reported. To simplify wording, the representation (4.6) will be referred to as matrix compression scheme. All simulations were carried out for a stationary two-dimensional device on a regular staggered grid with $5 × 50× 50$ nodes in $(x,H )$ -space for various expansion orders. Spherical energy bands were assumed and the $H$ -transform and MEDS were used for stabilization. A fixed potential distribution was applied to the device to obtain comparable results. For self-consistency with the Poisson equation using a Newton scheme, similar results can in principle be obtained by application of the matrix compression scheme to the Jacobian.


$L$	$S$	$∑ Qi ⊗ Ri$	Unknowns

1	3.7 MB	4.7 MB	0.2 MB
3	28.4 MB	4.7 MB	1.4 MB
5	83.1 MB	4.7 MB	3.5 MB
7	168 MB	4.8 MB	6.6 MB
9	263 MB	4.8 MB	10.7 MB
11	470 MB	4.8 MB	15.7 MB
13	709 MB	4.9 MB	21.6 MB

Figure 4.1: Memory requirements for the uncompressed and the compressed system matrix compared to the memory needed for the unknowns for different expansion orders $L$ on a grid in the three-dimensional $(x, H)$ -space with $5× 50 × 50$ nodes.

Figure 4.2: Memory used for the uncompressed and the compressed system matrix for different expansion orders $L$ on a three-dimensional $(x, H)$ -grid with $12500$ nodes.

First, memory requirements for the storage of the system matrix are compared. The total number of entries stored in the matrix were extracted, multiplied by three to account for row and column indices. Eight bytes per entry for double precision are used. In this way, the influence of different sparse matrix storage schemes is eliminated. The results in Fig. 4.1 and Fig. 4.2 clearly demonstrate the asymptotic advantage of (4.6): While no savings are observed at $L = 1$ , memory savings of a factor of $18$ are obtained already at an expansion order of $L = 5$ . At $L = 13$ , this factor grows to $145$ . In particular, the memory requirement for the matrix compression scheme shows only a weak dependence on $L$ and is determined only by the additional memory needed for the coupling matrices $Ri$ in (4.14)-(4.20). With increasing expansion order $L$ , the additional memory requirements for the compressed scheme grow quadratically with $L$ because of the $2 (L + 1)$ spherical harmonics of degree smaller or equal to $L$ . Even at $L = 13$ the additional memory compared to $L = 1$ is less than one megabyte. Consequently, the memory used for the unknowns dominates already at moderate values of $L$ , cf. Fig. 4.1 and Fig. 4.2.


	Full system		Condensed

$L$	$S$	compr.	$Scond$	compr.

1	3.9	7.4	0.2	9.2
3	28.4	19.3	4.0	17.9
5	73.9	53.2	15.7	48.9
7	134.8	98.3	36.5	92.2
9	228.1	160.7	68.2	149.8

Figure 4.3: Comparison of execution times (milliseconds) for matrix-vector multiplication at different expansion orders $L$ for the fully set up system matrix and the proposed compressed matrix scheme. Both the full system of linear equations and the condensed system with odd order unknowns eliminated in a preprocessing step are compared.

Execution times for matrix-vector multiplications are compared in Fig. 4.3 for the case of a full system matrix and the system matrix with odd expansion orders eliminated. For the lowest expansion order $L = 1$ , the matrix compression does not pay off and execution times are by a factor of two larger then for the standard storage scheme. This is due to the additional structural overhead of the compressed scheme at expansion order $L = 1$ , where no compression effect occurs. However, for larger values of $L$ , the matrix compression scheme leads to faster matrix-vector multiplications with the full system of linear equations. An additional performance gain of about ten percent is observed. Comparing execution times for the condensed system, where odd order unknowns have been eliminated in a preprocessing step, the runtime penalty for matrix-vector multiplication is a factor of $15$ at $L = 1$ , but in this case there is no compression effect anyway. At $L = 5$ , the runtime penalty is only a factor of three and drops to slightly above two at $L = 13$ . In conclusion, the matrix compression scheme allows for a trade-off between memory consumption and execution time, where the memory reduction is much larger than the penalty on execution times.


L	GMRES(50)	GMRES(30)	GMRES(10)	BiCGStab	Unknowns

1	10.2 MB	6.2 MB	2.2 MB	1.2 MB	0.2 MB
3	71.4 MB	43.4 MB	15.4 MB	8.4 MB	1.4 MB
5	178.5 MB	108.5 MB	38.5 MB	21.0 MB	3.5 MB
7	336.6 MB	204.7 MB	72.6 MB	39.6 MB	6.6 MB
9	545.7 MB	331.7 MB	117.7 MB	64.2 MB	10.7 MB
11	800.7 MB	486.7 MB	172.7 MB	93.5 MB	15.7 MB
13	1101.6 MB	669.6 MB	237.6 MB	129.6 MB	21.6 MB

Table 4.1: Additional memory requirements of the linear solvers GMRES(s) with different values of $s$ and BiCGStab compared to the memory needed for the unknowns, cf. Fig. 4.4.

Figure 4.4: Additional memory requirements of the linear solvers GMRES(s) with different values of $s$ and BiCGStab as given in Tab. 4.1.

A comparison of the additional memory required by GMRES(50), GMRES(30), GMRES(10) and BiCGStab is shown in Tab. 4.1 and Fig. 4.4. As expected, GMRES leads to higher memory requirements than many other Krylov methods such as BiCGStab. For GMRES( $s$ ), $s + 1$ auxiliary vectors of the same length as the vector of unknowns are used, while BiCGStab uses six auxiliary vectors of that size. It can clearly be seen that the memory required by GMRES(50) is by one order of magnitude larger than the memory needed for the compressed system (i.e. second and third column in Fig. 4.1) and BiCGStab. On the other hand, without system matrix compression, the additional memory needed by GMRES(50) is comparable to the memory needed for the system matrix and is thus less of a concern.

[next] [prev] [prev-tail] [front] [up]

Chapter 4Structural Properties