Disseration: Efficient Flux Calculations for Topography Simulation

3 Simulation Platform

The platform introduced in this chapter is used as a development platform for implementing and evaluating novel computationally efficient approaches to calculate the particle transport in three-dimensional etching and deposition simulations. The approaches presented in Chapters 4, 5, and 6 were implemented and validated using this platform.

The performance of a particle transport calculation can be investigated detached from a simulation of a dynamic surface, i.e., using

• a set of surface geometries with corresponding surface models,
• information about the domain and boundary conditions, and
• information about the properties of the particle sources.

However, there is a disadvantage when using such a static setup: The accumulated effect of the particle transport method on the final topography at the end of the simulation cannot be assessed as the surface is not advected.

The alternative is to embed the evaluation directly into a process simulator. This provides access to all simulation results and allows for the evaluation of particle transport methods throughout the course of a full simulation. However, most available process simulators (cf. Section 2.1) are closed-source projects and therefore unsuitable. At the time of writing, ViennaTS [87] is the only available three-dimensional fully open-source process simulator. Due to a lack of functionality in ViennaTS regarding geometry construction, serialization of the level-sets, and surface extraction a standalone evaluation platform is desired.

To provide an overview, Figure 3.1 illustrates the sequence of computational tasks during a process simulation using such a simulation platform: First, the domain and the layers (i.e., the level-set functions which delimit the material regions) are created (left column). Then, the main computational tasks in each time step of the simulation (middle column) are

• the extraction of an explicit representation of the top layer¹, i.e., a triangle mesh including the information of the active material for each triangle,
• the calculation of the surface rates for each point on the top layer, which requires the calculation of the particle transport,
• the calculation of the surface velocities, i.e., the velocity for each point of the top layer,
• the extension of the surface velocities, i.e., preparing a velocity field for all level-set grid points in the narrow band of the top layer, and
• the advection of the top layer according to the surface velocity field including an appropriate update of the other layers (cf. Section 3.1).

Finally, the resulting layers and material regions are extracted and saved (right column).

Figure 3.1: Coarse-grained flow chart of the process simulation identifying the main computational tasks: Surface extraction, surface rate calculation, surface velocity calculation, creating of the surface velocity field, and the advection of the layers. The relation of the computational tasks encapsulated by the interface classes (cf. Section 3.2) is indicated.

To minimize development effort, open-source third-party libraries were employed. Briefly, the simulation platform is based on OpenVDB [95], using its sparse volume data structure; the tools provided by OpenVDB to manipulate the sparse data (e.g., narrow band level-set methods for advecting/normalizing level-sets and extraction of explicit surfaces) are also utilized. The ray-surface intersection queries for the calculation of the particle transport are performed using Embree [89] as external ray tracing library. Other ray tracing libraries (cf. Chapter 5) were used for validation and performance evaluations.

In the following, first, the necessary extensions to match the requirements of a feature-scale process simulation are discussed in detail. Then, an overview of the software architecture is given by focusing on the abstract interface classes which decouple the individual computational tasks. Finally, test cases and benchmarks are presented.

¹ See Section 3.1 for the definition of top layer.

3.1 Requirements for Feature-Scale Process Simulation

The sparse volume data structure provided by OpenVDB [95] is used for surface representation. In [51] a detailed overview of the hierarchical data structure and implemented algorithms is provided and performance benchmarks are presented. OpenVDB suits the requirements for a level-set-based feature-scale process simulator to large extends as it provides [51]

• a configurable sparse data structure for volumetric data with cache efficient sequential access and fast random access (which is used to store the layers),
• a multi-threaded narrow band level-set method using TVD-RK explicit time integration schemes and a Godunov scheme combined with WENO schemes for spatial derivatives (which is used for the advection of the top layer),
• multi-threaded flow-based re-normalization of the narrow band using the same discretization schemes as for the advection (which is used for normalization of the top layer after each advection),
• geometry construction and multi-threaded CSG operations (which are used to update the other layers after the top layer is advected (cf. Section 3.1) and to construct the initial geometries),
• ray marching (i.e., ray casting) using the hierarchical volume data structure directly (which is used as a baseline for volume/implicit ray tracing performance, cf. Section 5.2),
• a fast dual method for the extraction of quad/triangle meshes (which is used for the extraction of a triangle mesh from the top layer, cf. Section 5.2),
• a conversion from quad/triangle meshes to volumes/level-sets (which is used to import explicit geometries), and
• serialization/deserialization of volumes/level-sets using a memory efficient file format (which is used to store intermediate and final results of the layers and regions).

Nevertheless, OpenVDB also lacks some requirements for a feature-scale process simulator, namely

• a confined domain size with prescribed boundary conditions (which is necessary when simulating only a small region of the wafer surface),
• a multi-material level-set advection logic (which is necessary if more than one material region is present in the simulation domain),
• a method for surface velocity extension (which is necessary as the surface velocities in a process simulation are only available for the surface, cf. Section 2.3), and
• a ray-surface intersection test which considers the confined domain and its boundary conditions.

In the following, the necessary extensions are briefly introduced.

Boundary Conditions

In feature-scale process simulations only a small region of the wafer surface is considered. Therefore, it is necessary to define boundary conditions for the vertical and lateral boundaries of the domain. The spatial position of the vertical boundary is adopted after each time step according to the new maximum vertical extents of the top layer. Two types of domain boundaries are commonly used for the lateral boundaries in a Process TCAD simulation: Periodic and reflective boundary conditions. Figure 3.2 illustrates the effect of periodic boundaries $\mathcal {B}_p$ (right) and reflective boundaries $\mathcal {B}_r$ (left) in two dimensions.

Figure 3.2: Two-dimensional illustration of a simulation domain with a reflective boundary condition $\mathcal {B}_r$ on the left side and a periodic boundary condition $\mathcal {B}_p$ on the right side. The vertical domain boundaries are infinite, i.e., are adopted according to the maximum extents of the *top layer* surface $\mathcal {S}$ (blue); the source plane $\mathcal {P}$ (green) is positioned according to the upper vertical domain extent.

$ \mathcal {B}_r $ — Figure 3.2: Two-dimensional illustration of a simulation domain with a reflective boundary condition $\mathcal {B}_r$ on the left side and a periodic boundary condition $\mathcal {B}_p$ on the right side. The vertical domain boundaries are infinite, i.e., are adopted according to the maximum extents of the *top layer* surface $\mathcal {S}$ (blue); the source plane $\mathcal {P}$ (green) is positioned according to the upper vertical domain extent.

Reflective boundary conditions are possible for all geometries; periodic boundary conditions require that the initial geometry on one boundary matches the geometry on the opposing boundary for each applied dimension in which the boundary condition is applied.

If reflective boundary conditions are used, the emission of the particle sources must be rotationally symmetric with respect to the vertical axis; for periodic boundary conditions, the particle sources do not have this constraint.

The boundary conditions are implemented using a straightforward ghost region approach: The lateral boundaries of the simulation domain are extended by a thin layer of grid cells which are initialized according to the applied boundary condition. The surface advection is performed also for the ghost regions. The values in the ghost regions are discarded and re-initialized after each time step of the simulation. Figure 3.3 illustrates the top layer and ghost regions (blue) for a three-dimensional structure before and after a tilted directional etching (i.e., a non-symmetric particle source).

(a) Initial *top layer* and *ghost region*

Figure 3.4a illustrates the effective surrounding geometry for the initial geometry of Figure 3.3a, if periodic and reflective boundary conditions are combined. Figure 3.4b illustrates the same simulation results as Figure 3.3b but with the effective surrounding geometry which is produced by the periodic boundary conditions.

(a) Reflective (x-axis) and periodic (y-axis) boundary conditions

Multi-Material Advection

Typically, different material regions are present during an etching process simulation, e.g., a substrate patterned with a mask. Also for deposition processes at least two materials are present, i.e., the initial material and the material which is deposited. The straightforward approach is to represent each material region with a corresponding level-set function. To simultaneously advect all material regions, each region would then be advected separately, leading to potentially mutual penetration. In this case, the parts of a region which are penetrated by another region would be treated inactive, i.e., not subject to advection. One approach would be to perform a Boolean operations between material regions to dissolve the penetrations; in this case a strategy to decide which material fills the former penetrated volume has to be set up.

Another approach [22][87] is to not advect each material region separately but to construct a total union of all regions and advect this top layer. To be able to advect the top layer with the correct surface speed of the underlying material region, it is necessary to detect the active material for each point on the top layer. This active material for a point $\bm {x}$ on the top layer is obtained by querying the value of all level-sets at $\bm {x}$ : The construction of the top layer implies that the values of the queries are all $\le 0$ . The material of the level-set with the smallest value is considered active; the active material at a point on the surface is therefore

$(3.1) \{begin}{align} \label {eq:activelayer} M(\bm {x}) & = \min \displaylimits _{m \in \{1 .. N_m\}} \phi _m(\bm {x}) \ , \{end}{align}$

where $N_m$ is the number of materials and $\phi _m$ is the level-set function corresponding to material $m$ . Additionally, the level-sets have a fixed order and the lower level-set is chosen as active material, if the values are numerically identical.

However, with this approach it is not straightforward to deposit (i.e., the surface velocity $V_n > 0$ ) multiple materials simultaneously; instead one is restricted to deposit only one “top material” where $V_n > 0$ . If $V_n < 0$ , i.e., material is removed, the top layer is advected according to the underlying active material. In a subsequent Boolean operation between the advected top layer and each material region, the removal of the material is transferred to the level-sets which represent the materials. The Boolean operation to subtract the volume penetrated by level-set $\phi _B$ from level-set $\phi _A$ is

$(3.2) \{begin}{align} \label {eq:booldiff} \phi _A^{\prime }(\bm {x}) & = \min \{\phi _A(\bm {x}) , -\phi _B(\bm {x})\} \ . \{end}{align}$

A significant advantage of this “top layer” approach is that material layers can be represented with sub-grid resolution. This is possible if the level-sets representing the materials are chosen to not map directly to the material regions, but are constructed “additively”. An example of a three materials setup (illustrated in Figure 3.5a) is used for demonstration: A thick bottom region (blue) is covered by a thin layer (green) which is exposed only in a circular region due to a mask (red). The level-set resolution is chosen such that only a single level-set grid point falls within the vertical span of the green layer.

Figure 3.5b shows a cross section of the (narrow band) level-set grid of the domain. The colored lines mark the extracted zero-level-sets of the regions if the level-sets directly represent the material regions. Figure 3.5c illustrates the same cross section but showing the zero-level-sets, if the level-sets are constructed additively using

$(3.3–3.5) \{begin}{align} \label {eq:additiveLayers} \phi _{1} & = \phi _{blue} \ , \\ \phi _{2} & = \phi _{blue} \cup \phi _{green} \ , \\ \phi _{3} & = \phi _{blue} \cup \phi _{green} \cup \phi _{red} \ , \{end}{align}$

where the unions are constructed using

$(3.6) \{begin}{align} \label {eq:csgunion} \phi _{A} \cup \phi _{B} & = \min \{\phi _A(\bm {x}) , \phi _B(\bm {x})\} \ . \{end}{align}$

Figure 3.5d once more illustrates the zero-level-sets from Figure 3.5c and additionally shows the reconstructed material regions obtained from

$(3.7–3.9) \{begin}{align} \label {eq:exractregions} \phi _{blue} & = \phi _{1} \ , \\ \phi _{green} & = \phi _{2} \setminus \phi _{1} \ , \\ \phi _{red} & = (\phi _{3} \setminus \phi _{2}) \setminus \phi _{1} \ . \{end}{align}$

Figure 3.6 chronologically illustrates the development of the material stack using top layer advection and the additive scheme to represent the materials. Starting from Figure 3.6b the green layer is represented with sub-grid resolution, until it is fully etched away in Figure 3.6c. Using a straightforward scheme, the thin region of the green layer cannot be represented from Figure 3.6b onwards. It also becomes apparent that a void is formed between the green and the blue layer already in Figure 3.6a; this is due to the nature of the level-set function which is a scalar field and does not hold directional information.

In a time step where a region of a layer is fully etched and the underlying material becomes active (cf. Figure 3.6c), the surface velocities of the involved materials must be averaged: If only the surface velocity for the green layer is considered for the full time step, the surface advection speed is too slow or too fast, if the surface velocity for the blue material is faster or slower, respectively. This is corrected by adopting the surface velocity in such cases and using a convex combination $V_{avg}$ of the velocities according to the initial distance of the layers weighted with the corresponding surface speeds

$(3.10–3.12) \{begin}{align} \label {eq:velaverage} \Delta t V_{avg} & = \Delta t_{g} V_{g} + \Delta t_{b} V_{b} \ , \\ \Delta t & = \Delta t_{g} + \Delta t_{b} \ , \\ \Delta t_{g} & = \frac {\Delta x_{g\rightarrow b}}{V_{g}} \ , \{end}{align}$

where $\Delta x_{g\rightarrow b}$ is the initial distance between the green ( $g$ ) and the blue ( $b$ ) layer.

Velocity Extension

The velocity extension is implemented in the most straightforward approach [35]: For each level-set cell in the narrow band, the velocity of the closest point on the surface is identified and used. A step along the gradient weighted with the negated level-set value is performed and the search for the nearest velocity value is started at the resulting position (which is close to the surface if the level-set has a signed-distance property). The nearest neighbor search is accelerated by using OpenVDB’s space-partitioning acceleration structure PointIndexGrid. The velocity extension in the ghost regions is handled equally but with a preceding mapping of the query location into the domain according to the boundary conditions.

Ray-Surface Intersection

The ray-surface intersection queries (which are extensively used during the calculation of the particle transport) must consider the confined domain with prescribed boundary conditions:

• If a reflective boundary of the domain is intersected, the direction is reflected and the origin is set to the intersection point.
• If a periodic boundary condition is intersected, the direction is maintained but the origin is translated to the opposing domain boundary.

A maximum number of domain boundary intersections is defined to prevent infinite domain intersections for ray directions which are (nearly) horizontal and do not intersect with the geometry.

3.2 Software Design

The simulation platform is not intended to be an operational process simulator but is developed to support the evaluation of novel approaches for the particle transport. To that end, from a design perspective, the requirements are

• to support multiple implementations of the same computational task,
• to decouple individual computational tasks,
• to provide explicit interfaces between the computational tasks, and
• to not constrain new implementations of computational tasks by choosing a loose coupling between the interfaces.

A loose coupling of the individual computational tasks is achieved by the use of abstract interface classes. The relations between the interfaces are not enforced by the design: This allows to integrate new implementations of computational tasks without design constraints but provides guidance to reduce redundancy once an implementation matures. Figure 3.7 provides an overview of the relation between the interface classes and the names of the implementations of the interfaces used for the test case in Section 3.3.

Figure 3.7: Main interface classes (blue boxes, postfix `IF`) and examples for corresponding implementations.

To support mixed-precision approaches, the floating point data types are templated using C++ template metaprogramming, i.e., different floating points types can be configured for each task. The floating point templatization and some details of the interfaces are not reflected in the following to keep the listings concise.

The outermost level is the SimulationIF class (cf. Figure 3.7). The only interface method is a function to trigger the simulation of a process for $T=\text {\texttt {simtime}}$ . The class shown in Listing 3.1 sketches a multi-material simulation implementing SimulationIF which utilizes the other interface classes as intended by the design. A static function (create) prepares the arguments for the interface constructor and returns a shared pointer object. The run-method first determines the maximum surface velocity of the process and calculates the maximum time step according to the advection scheme. Then, a point cloud of surface velocities is calculated using the calcSurfVelos-method of ProcessIF. The top layer is then advected according to these surface velocities using the advect-method of AdvectIF. Finally, the advection of the top layer is transferred to the other material layers. This sequence is repeated until $\text {\texttt {t}} == \text {\texttt {simtime}}$ .

Listing 3.1: Process Simulation Implementation

class SimulationMultiLayer : public SimulationIF {
private:
  GeometryIF::Ptr mGeometry;
  ProcessIF::Ptr mProcess;
  AdvectIF::Ptr mAdvect;
public:
  using Ptr = shared_ptr;
  static SimulationMultiLayer::Ptr Create([...]) {
      // [set/create members]
      return make_shared([...]);
  }
  SimulationMultiLayer([...]) : mGeometry([...]), [...] {}
  void run(double simtime) {
      double t = 0;
      while (t < simtime) {
          double vmax = mProcess->maxSurfVelo();
          double dtmax =
             mAdvect->CFL() * mGeometry->Domain.dx / abs(vmax);
          double dt = (t + dtmax) < simtime ? dtmax : simtime - t;
          vector points;
          vector velos;
          mProcess->calcSurfVelos(mGeometry, dt, points, velos);
          mAdvect->advect(t, dt, mGeometry->domain(), points, velos,
                        mGeometry->layers().back());
          // [boolean operations with other material layers]
          t = t + dt;
      }
  }
};

The GeometryIF class provides access to the domain (e.g., extents and boundary conditions) and the material layers. It is also used for the creation of the initial geometry (using CSG operations between level-sets) and saving of intermediate and final results.

A process is implemented using the interface shown in Listing 3.2. The first method (maxSurfVelo) calculates the maximum possible surface speed of the process. The second method’s task is to create a set of points on the surface (points) and calculate the corresponding surface velocities (velos); it has read access to the geometry, i.e., the domain and the material layers. The time step dt is provided to allow for the averaging of velocities as discussed in Section 3.1.

Listing 3.2: Process Interface

class ProcessIF {
public:
  using Ptr = shared_ptr;
  virtual double maxSurfVelo() const = 0;
  virtual bool calcSurfVelo(const double dt,
                             const GeometryIF::Ptr geometry,
                             vector &points,
                             vector &velos) = 0;
};

The surface rates for a particle source are interfaced using SurfaceRatesIF (Listing 3.3). An implementation of this class uses an implementation of RayTracingIF to conduct the ray-surface intersection tests. An interface for a particle source (ParticleSourceIF) is used to access the properties of the source or to generate random samples according to the emission properties of the particle source. The maxSurfRate-method returns the maximum possible surface rate. The calcSurfRates-method calculates the surface rates at all triangles (ratesAtTriangles) of the surface (mesh), i.e., the extracted surface of the zero-level-set of the top layer; mesh holds additional information for each triangle, e.g., which material is active or a quality measure (as the extraction algorithms potentially generate low quality triangles).

Listing 3.3: Surface Rates Interface

class SurfaceRatesIF {
public:
  using Ptr = shared_ptr;
  virtual double maxSurfRate() = 0;
  virtual void calcSurfRates(const GeometryIF::Ptr geometry,
                             const Mesh &mesh,
                             vector &ratesAtTriangles) = 0;
};

A source of particles is implemented according to ParticleSourceIF (Listing 3.4). The interface methods aim at two different approaches for the particle transport calculation:

flux(...): provides the emitted flux into direction dir and is used for bottom-up particle transport calculations.
sample(...): generates a random emission direction dir together with a scalar weight (the return value) and is used for top-down particle transport calculations. The same approach as in [22] is chosen to generate the random directions.

Listing 3.4: Particle Source Interface

class ParticleSourceIF {
public:
  using Ptr = shared_ptr;
  // flux emitted into direction
  virtual double flux(const Vec3f &dir) const = 0;
  // create random direction which resembles the distribution
  virtual double sample(Vec3f &dir) const = 0;
};

The ray tracing interface RayTracingIF (Listing 3.5) provides ray-surface intersection queries via a trace-method. The trace-method traces a ray from origin org into direction dir through the domain. If a domain boundary is intersected, the ray’s origin and direction are updated accordingly and tracing proceeds. The method reports the distance of the closest intersection with the surface hitDistance, the normal direction of the surface at the intersection location hitNormal, the ID of the intersected primitive primitiveID, and the number of performed ray casts numCasts (i.e., number of boundary intersections). The return value signals an intersection with the surface (true) or no intersection (false).

Listing 3.5: Ray Tracing Interface

class RayTracingIF {
public:
  using Ptr = shared_ptr;
  // refresh the scene
  virtual void refresh(const Domain &domain,
                             const vector &points,
                             const vector &triangles) = 0;
  // trace ray
  virtual bool trace(const Vec3f &org, const Vec3f &dir,
                     double &hitDistance, Vec3f &hitNormal,
                     int &primitiveID, int &numCasts) const = 0;
};

The refresh-method is used to refresh the scene, i.e., to re-build the acceleration structure according to the new geometry given by a triangle mesh (triangles and points).

An interface intended for the use with implicit ray tracing (i.e., the signed-distance field is used directly for ray casting) is implemented analogously; obviously, it lacks the information of an intersected primitiveID but provides an intersection location.

The advection of the top layer level-set is interfaced via AdvectIF (Listing 3.6). Only one method is defined which advects a level-set inside the domain from time t to t + dt using the surface velocities (velos) provided at the corresponding locations on the surface (points). The advection of the surface includes normalization of the level-set and an update of the narrow band, i.e., the narrow band has to be moved as the zero-level-set evolves.

Listing 3.6: Level-Set Advection Interface

class AdvectIF {
public:
 using Ptr = shared_ptr;
 // advect level-sets according to surface velocities
 virtual void advect(const double t, const double dt,
                     const Domain &domain,
                     const vector &points,
                     const vector &velos,
                     LevelSet::Pt > &levelset) = 0;
};

The extension of the surface velocities into the narrow band is provided by VeloFieldIF shown in Listing 3.7. The refresh-method is intended to prepare (or conduct) the extension into the narrow band. The extended velocities are accessed via the overloaded operator().

Listing 3.7: Velocity Field Interface

class VeloFieldIF {
public:
  using Ptr = shared_ptr;
  virtual void refresh(const Domain &domain,
                    const LevelSet::Ptr levelset,
                    const vector &points,
                    const vector &velos ) = 0;
  virtual double operator()(const Coord &ijk) const = 0;
};

The relation of the interfaces introduced above to the sequence of computational tasks throughout a process simulation is indicated in Figure 3.1. The tasks in the left and right column are handled by an implementation of GeometryIF. The tasks in the center column are handled in the run-method of SimulationIF: First, GeometryIF is used to extract the top layer surface. Then, ProcessIF is used to calculate the surface rates and surface velocities (utilizing ParticleSourceIF, RayTracingIF, and SurfaceRatesIF). It follows the velocity extension (VeloFieldIF) and the advection (AdvectIF).

The test cases in the following sections use computationally inexpensive implementations of ProcessIF, i.e., a calculation of the particle transport is not required for the surface velocities. The runtime of the simulations are therefore dominated by the advection of the level-sets and extraction of the surface representation.

3.3 Test Cases

The multi-material surface advection capabilities of the platform are validated using three test cases. Simple surface velocity models are used which are independent of the particle transport. Therefore the surface advection and maintenance of the multi-material level-sets becomes the main computational load. Different spatial resolutions are used, the final profiles are compared, and the performance is tracked throughout the simulations. The main computational tasks for the following test cases is the advection of the surface including velocity extension and re-normalization.

Enright Test

The Enright test is described in [62] and is a standard test for benchmarking level-set data structures and related numerical methods [51][63]. The test is straightforward to setup: A sphere with radius $0.15$ is placed with its center at $\bm {x}=[0.35,0.35,0.35]$ . The simulation domain is the unit box: $[0,0,0]$ to $[1,1,1]$ (cf. Figure 3.8). The velocity field $\bm {V}(\bm {x},t)$ is a three-dimensional incompressible flow field modulated with a period $T=3$ , i.e., a multiplication with $\cos (t\frac {\pi }{3})$ , which results in

$(3.13) \{begin}{align} \label {eq:enrightfield} \bm {V}(\bm {x},t) = \begin {bmatrix} \cos (t\frac {\pi }{3}) [2 \sin ^2(2\pi x)\sin (2xy)\sin (2\pi z)] \\ \cos (t\frac {\pi }{3}) [-\sin (2\pi x)\sin ^2(\pi y) \sin (2\pi z)] \\ \cos (t\frac {\pi }{3}) [-\sin (2\pi x)\sin (2\pi y)\sin ^2(\pi z)] \end {bmatrix} \ . \{end}{align}$

The sphere is mangled through this velocity field resulting in a very thin structure at $T=1.5$ where the velocity field and thus the deformation as well is reversed (due to the temporal cosine modulation). The analytical result is again the original structure (sphere) for $T=3.0$ .

Figure 3.8: Initial position of the sphere for the *Enright test* in the unit domain (gray) and deformed sphere at $T=1.5$ (yellow).

$ T=1.5 $ — Figure 3.8: Initial position of the sphere for the *Enright test* in the unit domain (gray) and deformed sphere at $T=1.5$ (yellow).

In Appendix A.1, Figure A.1-A.4 illustrate the deformed states of the sphere from $T=0.0$ to $T=3.0$ for various resolutions. For the lowest resolution ( $128^3$ , Figure A.4) the thin regions are not representable from $T= 1.2$ onwards (cf. Section 3.1), leading to a drastic loss in volume at the final time $T=3.0$ . When the resolution in is increased (cf. Figure A.1-Figure A.3), the loss in volume is decreased.

Table 3.1 shows the results using the OpenVDB data structure to store the level-set and performing the surface advection using a TVD-RK3 time integration scheme and WENO schemes for spatial discretization; three iterations (i.e., time steps) of a flow-based normalization are performed after the advection using the same WENO schemes and a TVD-RK1 time integration.

$ 256^3 $ — Table 3.1: Performance characteristics for the *Enright test* for resolutions from $128^3$ to $2048^3$ (min/max) test on WS1 (cf. Section 3.4). Column 1: Memory footprint of the VDB data structure (single-precision floating point representation). Column 2: Time for the advection step (TVD-RK3, WENO) including normalization (3 iterations, flow-based TVD-RK1, WENO) and tracking of the narrow band. Column 3: Advection performance in MAV/s (million active voxels per second). Column 4: Number of active voxels.

Material Dependent Isotropic Etching

The test configuration (isotropic test) consists of four stacked layers of different materials $M$ . The surface velocity model is an isotropic etch rate:

$(3.14) \{begin}{align} \label {eq:velisoetch} V_n(\mathbf {x}) & = f(M(\mathbf {x})) = \begin {cases} 0.07, & \ \text {if} \ M(\mathbf {x}) = M_4 \\ 1, & \ \text {if} \ M(\mathbf {x}) = M_3 \\ 0.05, & \ \text {if} \ M(\mathbf {x}) = M_2 \\ 1, & \ \text {if} \ M(\mathbf {x}) = M_1 \end {cases} \{end}{align}$

The lateral domain dimensions are $6 \times 1$ (cf. Figure 3.9). The boundary conditions are reflective in both lateral directions. The thicknesses of the layers from top to bottom are $0.25$ ( $M_4$ ), $0.25$ ( $M_3$ ), $0.02$ ( $M_2$ ), and $0.98$ ( $M_1$ ), respectively. The top layer ( $M_4$ ) has four cylindrical holes of varying diameters $0.4$ , $0.3$ , $0.2$ , and $0.1$ . The centers are located on one of the longer lateral boundaries of the domain and are equidistantly spaced with a distance of $1.5$ .

Three different level-set resolutions were used: $1/64$ , $1/128$ , and $1/256$ , corresponding to lateral dimensions of $1536\times 256$ , $768\times 128$ , and $384\times 64$ grid cells for the simulation domain. The simulation is run until time $T=1.0$ using a TVD-RK3 time integration scheme with $\alpha _{CFL} = 0.4$ . The simulation results for $T=[0,1]$ are illustrated in Figures A.5-A.7. The thin material layer ( $M_2$ ) has a low etch rate compared to the materials above and below; for the lowest resolution ( $1/64=0.0156$ ) the layer is just a bit thicker than one grid cell. Nevertheless, the multi-material level-set representation described above allows to represent this thin layer with sub-grid thickness (cf. Section 3.1) during the level-set advection.

If the material regions are reconstructed from the multi-material level-sets (using Boolean operations), the thin layer vanishes in regions where no grid cell center is situated between the wrapping level-sets. This effect is visualized in Figure 3.10 where this sub-grid resolution is shown for the thin layer of $M_2$ near the crater in $M_1$ at $T=0.7$ .

Figure 3.10: Material regions at $T=0.75$ for the rightmost hole. (a) The explicit representation of the *top layer* level-set is shown (red triangle mesh). (b) A spot near the crater in material $M_1$ is magnified to illustrate the sub-grid resolution of the thin material later ( $M_2$ ). If no grid cell is between the two wrapping level-sets, the thin layer vanished during a Boolean operation between the wrapping level-sets. This is visible in the horizontal region near the crater. Nevertheless, the layer is still represented through the wrapping level-sets and considered during the surface advection.

$ T=0.75 $ — Figure 3.10: Material regions at $T=0.75$ for the rightmost hole. (a) The explicit representation of the *top layer* level-set is shown (red triangle mesh). (b) A spot near the crater in material $M_1$ is magnified to illustrate the sub-grid resolution of the thin material later ( $M_2$ ). If no grid cell is between the two wrapping level-sets, the thin layer vanished during a Boolean operation between the wrapping level-sets. This is visible in the horizontal region near the crater. Nevertheless, the layer is still represented through the wrapping level-sets and considered during the surface advection.

Figure 3.11 shows the cross section for the hole of diameter $0.2$ at different simulation times for all three resolutions. The low etch rate of $M_2$ slows etching in the downward direction, until the thin layer is first opened at the center line of the hole.

Figure 3.11: Cross section of the *top layer* level-set for resolutions $1/256$ (black), $1/128$ (blue), and $1/64$ (red). For time $T=0.0,0.25,0.5,0.75$ , and $1.0$ , the *top layer* level-set for all three resolutions is shown.

$ 1/256 $ — Figure 3.11: Cross section of the *top layer* level-set for resolutions $1/256$ (black), $1/128$ (blue), and $1/64$ (red). For time $T=0.0,0.25,0.5,0.75$ , and $1.0$ , the *top layer* level-set for all three resolutions is shown.

Simple Bosch Process

The Bosch process is a technique to fabricate high aspect ratio structures which are obtained by alternating a passivation step with an etching step. The passivation layer protects the sidewalls in the following directional etching step. In the etching step, accelerated ions remove the passivation layer primarily on horizontal surface regions, i.e., at the open top surface and the bottom of the structure. Once the passivation layer is removed at the bottom, the substrate is also etched isotropically by a neutral etchant species.

For this test case (Bosch test), etching and deposition are assumed to be isotropic and the accelerated ions are modeled to have perfect vertical trajectories. No re-emissions are considered allowing to simplify modeling the ion flux by a single vertical (upward) visibility check weighted with the projected surface area. This simple model is deliberately chosen as this test case is solely conducted to evaluate the capabilities of the framework.

The involved materials are a mask $M_{mask}$ patterned with the desired structure, a substrate $M_{subs}$ into which the pattern is transferred, and a passivation material $M_{pass}$ . This results in an isotropic surface velocity for the deposition of the passivation layer $M_{pass}$

$(3.15) \{begin}{align} \label {eq:velboschdepo} V_n(\mathbf {x}) & = f(M(\mathbf {x})) = \begin {cases} 0.00025, & \ \text {if} \ M(\mathbf {x}) = M_{pass} \end {cases} \ , \{end}{align}$

and a combined (isotropic and directed) surface velocity for the etching step

$(3.16) \{begin}{align} \label {eq:velboschetch} V_n(\mathbf {x}) & = f(M(\mathbf {x}),\bm {n}(\bm {x})) = \begin {cases} 0.000875 + 0.00275 (\bm {n}\cdot \bm {\omega }_{up}), & \ \text {if} \ M(\mathbf {x}) = M_{pass} \\ 0.00025 + 0.0005 (\bm {n}\cdot \bm {\omega }_{up}), & \ \text {if} \ M(\mathbf {x}) = M_{mask} \\ 0.003 + 0.00295 (\bm {n}\cdot \bm {\omega }_{up}), & \ \text {if} \ M(\mathbf {x}) = M_{subs} \end {cases} \ , \{end}{align}$

where $\bm {n}$ is the surface normal direction and $\bm {\omega }_{up}$ is an upward unit vector. The second summand is only considered, if the upward visibility check (a single upward ray is traced) evaluates to false, i.e., the source is visible. The duration of the passivation step is $T=5$ and the duration of the etching step is $T=12$ . The simulation of 20 cycles is conducted using a domain with lateral dimensions $1 \times 0.5$ and reflective boundary conditions. A mask ( $M_{mask}$ ) of thickness $0.25$ is patterned with a cylindrical hole of radius $0.25$ with its center midway on one of the longer domain boundaries.

Figure 3.12 illustrates the results of the first 5 processing steps.

The results for all 20 cycles are provided in Figures A.8-A.11 in Appendix A.3 for resolutions $1/256$ , $1/128$ , $1/64$ , and $1/32$ . It becomes apparent that the sub-grid resolution of thin layers is essential especially at low resolutions, as the thin passivation layer would not be representable otherwise.

Figure 3.13 compares the final results for four different resolutions. Larger differences in the absolute position of the side wall are noticeable for the lower resolutions.

3.4 Benchmarks

The individual computational tasks of the simulation platform were benchmarked on two hardware configurations:

Workstation 1 (WS1). Intel Core-i7-4790K system, 4 physical cores (8 logical cores), $8\,$ MB of shared L3 cache, $32\,$ GB of main memory.

Workstation 2 (WS2)². Dual CPU Intel Xeon E5-2650v2 system, each CPU has 8 physical cores (16 logical cores), $20\,$ MB of L3 Cache, 64GB of main memory.

Utilizing all available cores (including hyper-threading), the isotropic test is benchmarked on WS1: The runtime of the main computational tasks (resolution $1/256$ ) is shown in Figure 3.14. The majority of runtime is utilized by the surface advection. The second largest runtime is used for the surface representation, which includes the extraction of an explicit surface of the top layer level-set and the detection of the active layer for each triangle of the top layer. The runtime per time step doubles throughout the simulation. This increase is due to the increasing surface area; from $T=0.9$ onwards, the surface area and consequently the runtime decreases.

Figure 3.15 shows the runtime of the main computational tasks throughout the simulation of the Bosch test: Also here, the dominating runtime during the deposition steps is the surface advection. During the etching steps the surface representation and the surface rates increase: The active material region must be determined leading to a larger runtime for the surface representation, and the upward visibility test effects the runtime for the surface rates.

Figure 3.14: Runtime of the main computational tasks for all time steps ( $T=0$ to $T=1.0$ ) of the *isotropic test* illustrated in Figure A.5. The lateral resolution of the domain is $1536\times 256$ . At the top, the corresponding cross sections of the domain are shown for times $T=0.25,0.5,0.75$ , and $1.0$ . The simulation was performed on WS1.

$ T=0 $ — Figure 3.14: Runtime of the main computational tasks for all time steps ( $T=0$ to $T=1.0$ ) of the *isotropic test* illustrated in Figure A.5. The lateral resolution of the domain is $1536\times 256$ . At the top, the corresponding cross sections of the domain are shown for times $T=0.25,0.5,0.75$ , and $1.0$ . The simulation was performed on WS1.

Figure 3.15: Runtime of the main computational tasks for all time steps ( $T=0$ to $T=340$ ) of the *Bosch test* illustrated in Figure A.8. The lateral resolution of the domain is $256\times 128$ . At the top, the corresponding cross sections of the domain are shown after cycle 1 ( $T=17$ ), cycle 5 ( $T=85$ ), cycle 10 ( $T=170$ ), and cycle 20 ( $T=340$ ). The simulation was performed on WS1.

$ T=0 $ — Figure 3.15: Runtime of the main computational tasks for all time steps ( $T=0$ to $T=340$ ) of the *Bosch test* illustrated in Figure A.8. The lateral resolution of the domain is $256\times 128$ . At the top, the corresponding cross sections of the domain are shown after cycle 1 ( $T=17$ ), cycle 5 ( $T=85$ ), cycle 10 ( $T=170$ ), and cycle 20 ( $T=340$ ). The simulation was performed on WS1.

To evaluate the parallel scalability of the surface advection, the first time steps of the Enright test for resolution $2048^3$ (7.1 MAV), the isotropic test for resolution $1536\times 256$ (7.0 MAV), and the Bosch test for resolution $256\times 128$ (0.5 MAV) were benchmarked on a 16-core WS2. Figure 3.16 illustrates the results for up to 32 threads: The speedups are 12, 10, and 7.5 for 16 threads, respectively.

Figure 3.16: Parallel scalability of the surface advection for the *Enright test*, the *Bosch test*, and the material dependent isotropic etching test case. The benchmarks were performed on WS2.

² Represents a single node on the Vienna Scientific Cluster 3 (VSC-3); https://vsc.ac.at

3.5 Summary

The developed simulation platform provides important features to validate new approaches for the particle transport calculation: Multi-material advection, sub-grid resolution of thin layers, periodic and reflective boundary conditions, and an interface for efficient ray-surface intersections using external libraries. These features allow to qualify new approaches for the particle transport using realistic simulation setups. The surface advection shows appropriate speedup on shared-memory systems, although the scalability of the surface advection is not primarily important (as the focus is on the performance of the particle transport), it is advantageously as a realistic distribution of runtime between the individual computational tasks is obtained.

The platform is used to implement and validate the numerical approaches in the following Chapters 4, 5, and 6, respectively.

Domain resolution	Memory [MB]	Advection [s]	Performance [MAV/s]	Active voxels [million]
$128^3$	0.6/1.0	0.01/0.03	2.3/2.8	0.027/0.078
$256^3$	1.6/4.3	0.04/0.13	2.7/3.0	0.11/0.39
$512^3$	5.4/18	0.15/0.56	2.9/3.1	0.44/1.7
$1024^3$	20/76	0.57/2.4	2.9/3.1	1.7/7.2
$2048^3$	80/330	2.4/10	2.9/3.1	7.1/29