Dissertation — Hierarchical Grids

Chapter 2 Hierarchical Grids

Numerical treatment of PDEs, e.g., the level-set equation, prescribing the time evolution of the interfaces, typically requires a discretization of the domain, the computational grid. The two approaches to discretization are structured (regular) grids and unstructured (irregular) grids.

Figure 2.1 shows some examples for structured and unstructured grids in two spatial dimensions. The representative for a structured grid is the Cartesian grid (cf. Figure 2.1a), where the grid points are located on a regular lattice and the cells (for a formal definition, see Section 2.1) are squares or cubes depending on the number of spatial dimensions. The connectivity of grid points is implicitly defined via their indexes on the regular lattice. The width along a spatial direction of such a cell is called grid resolution and is identical in all spatial dimensions. A rectilinear grid (cf. Figure 2.1b) allows different distances between grid points, thus the cells are rectangles or rectangular cuboids, in two dimensions and three dimensions, respectively. Rectilinear grids allow for a limited spatial adaptivity, by adapting the distance between grid points locally. The grid resolution is typically stored as a dedicated array of the distances between the grid points along each spatial dimension. A curvilinear grid (cf. Figure 2.1c) is often employed, if the PDE is formulated in curvilinear coordinates, i.e., in spherical or cylindrical coordinates, allowing for the spatial discretization to fit the PDE.

Cartesian grids and rectilinear grids enable a straightforward computation of derivatives using finite difference methods [63, 64]. The main disadvantage of structured grids is that the discretized domain has to be regular, e.g., a rectangle or a cylinder. Nevertheless, for process TCAD simulations on the feature scale, where typically a rectangular domain of the wafer is considered, this is no disadvantage.

In contrast, unstructured grids (cf. Figure 2.1d) are not restricted to regular domains as their cells are typically polytopes, e.g., triangles or tetrahedra. The usage of polytopes allows for the discretization of arbitrarily shaped domains. The drawback is the irregular connectivity of the grid points which has to be stored explicitly. PDEs discretized on unstructured grids are typically solved using the finite element method (FEM) [65] or the finite volume method (FVM) [66], which involve explicitly stored large sparse matrices in the solving procedure.

This thesis does not investigate the level-set method applied on unstructured grids and, therefore, the reader is referred to [67, 68, 69, 70, 71, 72] for details on this matter.

The discussion continues with the discretization of a domain using a Cartesian grid.

2.1 Discretization

Let $R^{3}$ be discretized by a Cartesian grid using the spatial resolution $Δ_{x}$ along the x-axis, $Δ_{y}$ along the y-axis, and $Δ_{z}$ along the z-axis. The nodes $(i, j, k) \in Z^{3}$ (triplets of indices) index all the grid points $(i Δ_{x}, j Δ_{y}, k Δ_{z})$ . This global indexing enables a unique identification of every grid point. The volume surrounding a grid point belonging to the node $(i, j, k)$

$\begin{array}{r} [(i - 0.5) Δ_{x}, (i + 0.5) Δ_{x}] \times [(j - 0.5) Δ_{y}, (j + 0.5) Δ_{y}] \times [(k - 0.5) Δ_{z}, (k + 0.5) Δ_{z}] \end{array}$ is called cell, which is the smallest unit in the computational domain. For a given function $Φ (x, y, z)$ defined on the domain, the function $Φ$ (discretized on the grid) is denoted by $Φ^{i j k} = Φ (i Δ_{x}, j Δ_{y}, k Δ_{z})$ for all grid points. Information associated with a grid point is referred to as data, e.g., the discretized value of a function. Computations typically involve more than a single grid point; consider, for instance, the approximation of a derivative as discussed below. The collection of all grid points necessary for such a computation is called stencil which typically involves the neighboring grid points. The widely-known seven-point stencil in three dimensions for a node $(i, j, k)$ includes the node itself and all direct neighbors, i.e., nodes which indices differs by at most one:

$\begin{matrix} (2.1) & \begin{aligned} (i - 1, j, k) & , (i + 1, j, k), \\ (i, j - 1, k) & , (i, j + 1, k), \\ (i, j, k - 1) & , (i, j, k + 1) . \end{aligned} \end{matrix}$

The computation of derivatives on a computational grid is essential in the context of solving PDEs. For first-order approximations finite difference schemes are particularly convenient, as shown in the following. The first-order accurate forward difference along the x-axis,

$\begin{array}{r} (2.2) & \frac{\partial Φ}{\partial x} = lim_{Δ_{x} \to 0} \frac{Φ (x + Δ_{x}) - Φ (x)}{Δ_{x}} \approx \frac{Φ (x + Δ_{x}) - Φ (x)}{Δ_{x}} = \frac{Φ^{i + 1 j k} - Φ^{i j k}}{Δ_{x}}, \end{array}$ is referred to as $D_{i j k}^{+ x} Φ$ . Similarly, the first-order accurate backward difference along the x-axis,

$\begin{array}{r} (2.3) & \frac{\partial Φ}{\partial x} \approx \frac{Φ^{i j k} - Φ^{i - 1 j k}}{Δ_{x}}, \end{array}$ is referred to as $D_{i j k}^{- x} Φ$ . Consequently, the second-order accurate central difference along the x-axis,

$\begin{array}{r} (2.4) & \frac{\partial Φ}{\partial x} \approx \frac{Φ^{i + 1 j k} - Φ^{i - 1 j k}}{2 Δ_{x}}, \end{array}$ is referred to as $D_{i j k}^{x} Φ$ . The derivatives along other spatial dimensions such as y-axis and z-axis are defined analogously.

Higher order approximations are possible (if $Φ$ is smooth enough) by using higher order finite difference schemes which, however, require a bigger stencil. In the here considered case of the level-set method so-called weighted essentially non-oscillatory (WENO) schemes are often employed [73, 74, 75, 76], if higher accuracies are desired. WENO schemes compute the derivative on several sub-stencils of a large stencil. Subsequently, the derivatives computed on the sub-stencils are combined through a convex combination to minimize spurious oscillations. Spurious oscillations occur, if the derived function lacks smoothness (differentiability) in the stencil considered for the finite difference computation. Level-set functions are not smooth around regions where the level-set function describes corners or edges of interfaces. This is an inherent feature, because corners are exactly those points where the level-set function is not differentiable. Edges and corners are typically present in interfaces used to describe structures considered in process TCAD simulations, thus higher order schemes are avoided, because they require more computational power and do not yield increased accuracy around corners, where it is most needed. Therefore, this thesis considers only first-order schemes.

If a higher resolution is required to better resolve a corner in a structured grid, the entire grid has to be resolved with the desired higher spatial resolution. In case of 3D domains this quickly becomes unfeasible, because of high memory requirements to store all grid points and the wasted computation power in irrelevant regions. As hinted previously, local refinement, where only parts of the domain are resolved with a higher spatial resolution, are a viable solution, to enable high spatial resolution around corners, while keeping the overall computational effort feasible.

The discussion continues with approaches to local refinement of the spatial resolution of a Cartesian grid to enable efficient application of computation power to regions which have the highest potential for an overall increased solution accuracy.

2.2 Refinement

Techniques to locally increase the spatial discretization for structured grids date back to [77] and are referred to as adaptive mesh¹ refinement (AMR). AMR was first used to solve hyperbolic PDEs [77, 78, 79] in the context of compressible flows. In the context of process TCAD simulations, fundamental basics are provided in [57].

In AMR the computational domain is discretized by several layers (levels) of Cartesian grids. These grid layers cover the entire discretized domain, with different spatial resolutions, hence the name hierarchical grid. The coarsest grid is referred to as Level 0, whilst consecutively spatially finer resolving grids have a higher level, e.g., Level 1 and Level 2. In Figure 2.2 three grids on different levels of a schematically depicted hierarchical grid are shown.

The ratio between the spatial resolutions of the levels is called refinement ratio. The refinement ratio is a positive integer greater than one. A visualization of the refinement of a single cell for some small integer refinement ratios is show in Figure 2.3 for the 2D case. For uneven refinement ratios grid points of the coarser grid are directly covered on a higher level. A high refinement ratio reduces the number of levels needed to reach a certain spatial resolution, at the cost of less gradual refinement. In [10] it is suggested to use different refinement ratios on different levels. However, all simulation examples presented in this thesis use a uniform refinement ratio on all levels, which is set to four and is considered an industry standard (cf. Figure 2.3c).

The goal of AMR is to increase the spatial resolution only locally, thus only parts of the grids are used for the computation. The parts of a grid that are used for computations (in process TCAD simulations those parts are regions around interfaces, especially around corners and edges, and material region boundaries) are defined by blocks.

A block is a rectilinear sub-domain of a grid consisting of a contiguous set of cells (cf. Figure 2.4). A block is uniquely identified by specifying the corner (node with the smallest indices in all spatial dimensions) and the size (number of grid points in each spatial dimension). The cells directly neighboring a block are the ghost cells of the block. Ghost cells enable the usage of the same stencil for computations on all grid points of the block (even for grid points on the border, where parts of the stencil in principle extend beyond the block boundaries). Depending on the targeted stencil size more than one direct neighbor may be necessary for the ghost cells. The usage of ghost cells has the cost of higher memory requirements, because, as a consequence, some cells have to be stored multiple times (once in a block and possibly several times in neighboring blocks as a ghost cell). The ghost cells are used to set boundary conditions on the block and exchange data with neighboring blocks.

A hierarchical grid is created in two steps:

• Flagging: The cells on a level are flagged (marked for refinement).
• Clustering: The flagged cells are clustered into blocks.

The flagging selects the cells on a level which have to be covered on the next higher (finer) level of the hierarchical grid. As example, on Level 0 the cells which shall be covered on Level 1 are flagged. In the context of process TCAD simulations the cells are typically selected by their level-set value (closeness to the interface), the local curvature (high curvature identifies corners), and distance to interfaces of other level-set functions (relevant for simulations containing multiple material regions). During a typical process TCAD simulation the regions where high spatial resolution is necessary change over time. This change is due to material regions (interfaces) being deformed, yielding different regions for high spatial resolution. Therefore, the hierarchical grid has to be adapted several times (cf. Section 4.2.7) during a process TCAD simulation.

There are three different approaches for clustering, depending on additional requirements on the size and placement of blocks:

• Cell-based AMR
• Tree-based AMR
• Block-based AMR

The three approaches are portrayed in the following.

¹ The computational grid is often named mesh.

Cell-based AMR

Cell-based AMR allows for individual cells to be refined independently [80, 81]. This enables a perfect refinement efficiency (number of flagged cells divided by number of refined cells), because only those cells which are flagged are refined. However, high memory requirements for the hierarchical structure and specialized stencil computations induce a non-negligible computational overhead. The typical data structure used are quadtrees [82, 83, 84] and octrees [23, 85, 86, 87, 42], in two dimensions and three dimensions, respectively. The refinement ratio is often two, but other refinement ratios are also considered [88, 10]. Issues arising on parallel systems, especially the even distribution of the data across the hardware resources for tree-like data structures is investigated in [89, 59]: A Z-curve (linear neighbor conserving traversal of space) is utilized to spread the data evenly across the available hardware resources.

Tree-based AMR

In contrast to cell-based AMR, tree-based AMR requires that only similar blocks (blocks of the same size) which are aligned to the grid are refined. The size of the blocks has to be bigger than one, else it is equivalent to the cell-based approach. So, if a single cell of a block is flagged all cells of the block are refined, thus the refinement efficiency is lower, but the computational overhead for storing the structure is smaller compared to the cell-based approach. Tree-based AMR is used for example in [10, 90, 91]. Parallelization of tree-based AMR is typically performed on a per block basis (each block is processed by a dedicated thread).

Block-based AMR

Block-based AMR (also known as patch-based AMR) allows differently sized blocks, thus allowing for a higher refinement efficiency. The cost of the higher refinement efficiency is a larger computational overhead for each block, because the size and the connectivity between blocks is not straightforward as in the tree-based approach [92, 93]. The connectivity (the neighboring blocks) is typically stored as a dedicated list for best performance. In this case the clustering is performed using a signature-inflection clustering method based on the approach presented in [94].

Starting from a single block covering the full domain, the block is iteratively split and trimmed to remove most of the not flagged cells. Typically a minimum size requirement is imposed on the blocks, which reduces the total number of blocks (high overhead of small blocks, due to the ghost cells). However, no maximum size for the blocks is set. Level-set simulations using this approach are presented by [95, 96]. Differently sized blocks (possible in this approach) have to be considered when parallelizing computations, because they may cause load-imbalances (computations on large blocks take usually significantly longer than on small ones).

In this thesis and in the presented reference simulation workflow (cf. Section 4.2) the block-based AMR approach is considered for all simulations examples. The reference simulation workflow always employs a single block on Level 0 covering the full computational domain. On the higher levels of the hierarchical grid several blocks are present, depending on the simulation problem at hand.

There are other approaches to AMR considered in level-set simulations, which are portrayed shortly in the following for the sake of completeness.

Other Approaches

The approach to AMR considered in [97] employs, locally around the interface, cells that contain a higher order approximation of the level-set function. The approximation in those cells is based on Gauss-Lobatto quadrature nodes instead of a single grid point per cell. This approach is conceptually similar to a higher spatial discretization locally around the interface.

For level-set simulations, there are many approaches where only cells in a narrow-band (several grid cells wide) around the interface are stored. Those approaches typically consider only a single grid, which usually employs a high spatial resolution. For example, in [98] the usage of a hash table data structure to store the narrow-band is proposed, but the conducted performance comparison showed no advantage to a reference cell-based AMR implementation.

The approach to only store grid points in a narrow-band around the interface is taken to the extreme in [99]. This approach, named sparse field, stores only grid points which are directly next to the interface (any computations typically requires the extension of the band of stored grid points).

2.3 Nesting Criteria

As mentioned previously, this thesis considers a block-based AMR approach. As such, the block placement and nesting criteria are essential and discussed in the following.

Data exchange between blocks of a hierarchical grid is necessary to couple the solution on different blocks. Data is exchanged between blocks on the same level, as well as between blocks on different levels. Data exchange procedures are costly, if an arbitrary relation between blocks has to be handled, because for each data exchange all blocks on all levels have to be considered. Therefore, enforcing placement rules on the blocks, i.e., restricting the possible relations between blocks, allows for optimized data exchange procedures which only require the consideration of a small sub-set of all blocks.

These placement rules are referred to as nesting criteria because they define how blocks are nested on hierarchical grids. The nesting criteria structure hierarchical grids and enable more efficient data exchange between blocks on different levels of a hierarchical grid. Data exchange from a coarse level to the next finer level is usually referred to as interpolation and conversely from a fine level to the next coarser level as restriction. If two blocks on different levels overlap the term parent for the block on the coarser level and the term child for the block on the finer level is used.

The four key nesting criteria considered in this work and representing an industry standard are:

1. Blocks shall not overlap other blocks on same level.
2. Each block has a unique parent block on the next lower level, except for Level 0 where there is no parent block.
3. Blocks shall be aligned to the grid on the next lower level.
4. A block shall not border an area which is not refined on the next coarser level.

The first criterion excludes overlapping blocks which would otherwise cover cells multiple times, resulting in computational overhead. Thus only ghost cells are stored more than once on a level of a hierarchical grid.

The second criterion effects the interaction between blocks on different levels. The main advantage having a unique parent block compared to approaches where the parent block is not unique, is that for data exchange procedures between levels of a hierarchical grid only two blocks have to be considered: The parent and its child block. The drawback is that more blocks are created on higher levels of a hierarchical grid, because the blocks have a maximum size imposed by their parent blocks.

The third criterion is automatically fulfilled in case the refinement is based solely on the flags of the coarser level. The underlying grids with different spatial resolutions are inherently aligned. Thus a cell is always either fully covered by refined cells or not covered at all, allowing for simplified exchange procedures between the levels.

The fourth criterion avoids a harsh border in the spatial resolution: The so-created gradual change in the spatial resolution, enables a stable solution.

In Figure 2.5 an example hierarchical grid in two dimensions with a total of three levels is presented. The employed refinement factor is four. The single block on Level 0 is outlined by a thick black line, whilst the block cells use a thin black outline. On Level 1 the blocks (total of four valid blocks) are colored in shades of green, and on Level 2 the blocks (total of seven valid blocks) are colored in shades of blue. Examples of blocks violating the four required nesting criteria a marked with respect to the violated nesting criterion list number (see list above) in red.

Collection of Terms

Analogously to [101] the terms used in this work to describe hierarchical grids are summarized (ordered alphabetically):

• block: Axis-aligned collection of continuous cells of the same size
• block cells: Cells belonging to a block
• cell: Smallest unit of the computational domain
• ghost cell: Halo of cells surrounding the block cells
• ghost point: Grid point belonging to a ghost cell
• grid: Generic description of the computational domain
• grid line: Axis-aligned line connecting two grid points
• grid point: Location of the center of a cell
• level: Union of blocks that have the same spatial resolution
• node: Index of a cell