6 Summary and Outlook

This work presented an overview on state-of-the-art mesh adaptation methods utilized in the fields of TCAD, together with an investigation and discussion of the utilized approaches and ideas stemming originally from the field of graph theory and graph coloring. A particular focus in the conducted study was the evaluation of these key technologies with respect to their computational aspects. The focus in this work was on shared-memory parallel methods necessary to efficiently utilize the increasing on-node parallelism of modern computer nodes. Thus, the required theoretical background for graphs and graph coloring, meshes and mesh adaptation, and an overview on modern computer architectures and parallelization techniques has been discussed in Chapter 2.

As graphs and in particular graph coloring play a key role in both, parallel computing and mesh adaptation, different solution strategies of the graph coloring problem were presented in Chapter 3. These strategies range from the potentially time-consuming approach of calculating exact solutions to faster but nearly optimal approximative techniques. Since performance is a key aspect in modern TCAD applications, the focus of the conducted evaluation was on widely used approximative serial and well-proven as well as new shared-memory parallelized graph coloring algorithms. The conducted study evaluated available shared-memory graph coloring algorithms regarding their solution quality and performance, showing that the parallel approaches yield speedups up to about 50 for 64 threads. These shared-memory parallel algorithms formed the basis for the subsequently developed parallelization methods relying on graph coloring techniques.

In TCAD applications one potential computational bottleneck is the mesh adaptation step required to obtain high quality meshes for numerical solution approaches typically based on FVMs and FEMs. To tackle this issue, a new coarse-grained shared-memory parallel mesh adaptation framework was developed and presented in Chapter 4. The main idea behind this framework is to decompose an input mesh into a user-defined number of partitions which are binned into independent sets using graph coloring algorithms based on the findings in Chapter 3. Additionally, flexibility with respect to the applied mesh adaptation strategies is provided by enabling the use of different well tested serial mesh adaptation algorithms in a parallel manner. This is achieved by adapting the different partitions of the same independent set simultaneously, as they do not share any mesh edges or facets. Thus, the available on-node parallelism on modern computer architectures can be exploited. However, to prevent over-refinement on the interfaces of the partitions the mesh adaptation on boundary elements is restricted to the partition residing in the lowest color class. To communicate the changes made by a partition a mesh healing step is conducted by all affected partitions. This guarantees a conforming mesh as well as prevents data races or inconsistencies. Two different mesh adaptation strategies are included into the coarse-grained mesher. With the presented coarse-grained mesher maximum speedups of 8.6 using 16 threads were obtained for the memory-bound mesh adaptation problem.

A special type of Process TCAD simulation is the simulation of an etching process. There, the calculation of the flux rates in every single timestep has the greatest share of the overall simulation runtime. Thus, highly efficient flux calculation methods are a key technique in order to improve the performance of etching simulations. To achieve this goal a novel approach for reducing the runtime of the flux calculation based on iteratively partitioning the surface mesh was proposed in Chapter 5. The new method is especially designed for multi-material etching simulations. To reduce the number of integration points on the surface, an initial sparse set of integration points is created. This set can be iteratively refined guided by a user-defined refinement condition in order to adjust the resolution and accuracy of the partitioning approach. To optimally capture critical geometrical features and material interfaces, which potentially yield highly different surface velocities, the refinement condition applied in this study was designed using a threshold for the flux and the normal angle between two points of the sparse set. Additionally, the surface mesh elements forming a material interface are also added to the sparse set. With this novel partitioning approach maximum relative speedups for the flux calculation of 7.0 compared to the conventional approach based on all surface mesh elements were obtained. The influence on the accuracy of the sparse set approach was proven to be reasonably small. Furthermore, a strong scaling evaluation of the flux calculation step highlighted the fact that parallelizing the serial iterative partitioning method is one of the major performance limiting factors

For the coarse-grained mesher presented in Chapter 4 the investigation of different approaches for conducting the healing operations on the interfaces is a promising option for increasing the quality of the resulting mesh. Additionally, different domain decomposition strategies could ultimately result in a better partitioning of the input mesh with respect to load-balancing.

Regarding the iterative partitioning method for multi-material etching simulations proposed in Chapter 5, a potential increase in computational performance is possible by parallelizing the serial parts of the flux calculation process, i.e., the initial creation of the sparse set and its iterative refinement. As the presented method is up to now restricted to direct flux only, the general applicability in Process TCAD could be massively increased by extending the framework to support also reemitted flux. One potential way to achieve this extension is to reuse the already existing sparse set and refine it further where necessary. Another possible solution would be the creation of dedicated sparse sets for each subsequent flux reemission step. Besides these two options, an adaptation of the refinement condition for a proper sparse set creation could be advantageous for the computation of the reemitted flux distributions.