The conduction of the experiments at controlled bias conditions with exact timing and low noise measurement of the typically observed small currents is a challenge throughout the entire characterization of a technology. Another key challenge which will be discussed in this chapter is the evaluation of the resulting data set in order to extract parameters for the defects. Furthermore, these parameters should serve as basis for subsequently performed simulations and lifetime estimations and should lead to accurate results. For RTN and TDDS traces, this means analyzing the drain current traces for step heights and transition times of the individual defects at a specific gate bias, whereas for CV measurements, it means finding the defect densities at a specific frequency and gate bias. In the following the typical measurement signals, their analysis and the meaning for the defect parameters and distributions are discussed.
RTN and TDDS signals consist of a sequence of or values. The data points may be sampled linear in time, i.e. in equidistant time intervals, or non-linear, e.g. with the sampling time increasing logarithmically. Some of the analysis methods presented in this section will work on both types of data, while some will require pre-processing or weighting of the data to yield correct results for non-linear sampled data.
For RTN data, the method of choice depends on the signal-to-noise ratio (SNR) of the recorded data, the parameters to be extracted from the measurements and the amount of data which has to be processed. In contrast to RTN data sets, TDDS signals from stress-recovery measurements have the advantages that they have a defined point where the recovery starts, and thus where the emission time can be measured from, and usually contain only a single emission step per defect. This allows for a much simpler data extraction using an edge-detection based method, as will be discussed at the end of this section.
Two of the most accessible methods for RTN and TDDS defect parameter extraction are the histogram method  and the slightly more advanced time-lag method . The idea behind these methods is to analyze the occurrence of values in the recorded signals.
For the histogram method, as the name suggests, a histogram is created from the recorded data. Given a low enough SNR, the individual levels in the signal appear as peaks in the histogram. The peaks themselves typically have a Gaussian shape and their width is determined by the noise of the signal. If the method is applied to RTN signals, as is the most common use-case, a total of peaks will be present for active defects in the signal. For the signal shown in Figure 5.1a with defects, this relation is visible in Figure 5.1b. The height of the peaks is determined by the ratio of capture and emission times for the defects.
The more advanced time-lag method works by creating a two-dimensional histogram or scatter plot of each two consecutive points. This results in a better separation between the peaks, as the distance between the peaks is increased by a factor of and points containing intermediate values of because a switch occurred during the sampling period are located on the off-diagonals. This is illustrated in Figure 5.1d. Four separate clusters are visible on the diagonal, and the most common transitions occur between the two lower, and between the two upper clusters. From this it follows that the average time constant of the smaller defect is much shorter than the one from the larger defect.
From the positions of the peaks in the histogram and the time-lag plots, the step heights of the defects visible in the signal can be extracted. If the peaks can be clearly separated a combination of defect states can be assigned to each peak, and the corresponding dwelling times can be extracted. If this is not possible, defect occupancies can be estimated by measuring the approximate areas of each peak.
One of the main drawbacks of these methods is their reliance on absolute values of the signal for obtaining defect states. This is shown for the orange signal in Figure 5.1a, c, and e. A small drift of 1.5 µV/s was added to the 100 s long signal. For the noise in the measurement, this is enough to obfuscate the number of defects in both the histogram and time-lag plots. Another drawback is the relatively high SNR required for these methods to work. This can be mitigated using a recent improvement by Martin-Martinez et al., the weighted time-lag method . Despite their drawbacks, these methods are commonly used for extraction of parameters from RTN signals and fast visual inspection of RTN data.
The main goal of RTN and TDDS data analysis is often to obtain the positions and amplitudes of the discrete steps in the signal. Thus, a natural approach is to use an edge detection algorithm to extract the steps from the data. Compared to other methods, the advantage of this approach is that edge detection is less susceptible to drift or low frequency noise present in the measurement data. In the following, two edge detection methods—the BCSUM algorithm and the Canny algorithm—will be outlined and benchmarked against each other.
The bootstrapping and cumulative sum (BCSUM) algorithm [120, 121] is a recursive algorithm which can be used to detect changes in the mean of a signal. It can be used on both uniform or non-uniformly sampled data with varying noise power. The general idea is to describe the measurement as two distinct constant signals separated by a step of height at time . This is described by the mean shift model:
Here, is the mean of the first signal, the step height, the step function and the change point. The discrete measurement signal is described by a vector of length . Assuming that the samples in the signal are normally distributed around the mean value in each of the two partial signals, the distribution for the two parts are given by and with means and and variances . With this, a log-likelihood ratio can be defined which changes its sign depending on which part of the signal the sample more likely belongs to:
The cumulative log-likelihood for any slice of a signal is defined as
With the assumed Gaussian distributions of the partial signals
the log-likelihoods ratio can be written as
with an expectation values of before and after the step respectively. The change point index is thus found as
As the underlying statistical distributions of the signals are unknown a priori, bootstrapping—a statistical method based on random sampling with repetition—is used for parameter estimation. For this, a cumulative sum is defined in lieu of :
where is a signal randomly sampled from the original signal. From ,
is obtained for a number of bootstraps. A sensitivity parameter is further defined for the algorithm. If then from the original signal is above the quantile of the distribution of , this is taken as indication that a changepoint occurred in the signal at of the original signal. In this case the algorithm is recursively applied on the two split parts of the signal.
An example is given in Figure 5.2. The BCSUM algorithm is applied to on the signal on the left. A step is found at , and the signal is split into the parts shown in the center and on the right. For the part shown in the center, no step is detected and the algorithm stops. For the part shown on the right on the other hand, another step is found and the algorithm continues by analyzing the two splits of this signal.
The Canny filter is an algorithm commonly used in computer vision to detect edges in recorded images, published in 1987 . Unlike 2D images analyzed in computer vision, or even 3D images analyzed in medical applications, RTN data are 1D signals which considerably simplifies the filter. In his work, Canny finds the optimal filter for step-edges based on three criteria. The criteria he defines for the filter are a low probability of error—equivalent to the SNR of the filter, a good localization of the edge position and the suppression of multiple detections per edge. Based on this, he found that there is a trade-off between the SNR and the localization, with the SNR increasing proportional to the square root of the filter width and the localization increasing inversely proportional to it. He further showed that the optimal filter kernel can be closely represented by the derivative of a Gaussian, which leads to improved performance. The first derivative of a Gaussian is given by
with some normalization constant A. The filter response is the convolution of the kernel—truncated to a width and discretized as —and the signal :
In the case of a 1D signal, non-maximum suppression, i.e. finding only the strongest edges in the signal, can be performed simply by finding the local extrema in the response:
Finally, thresholding has to be applied to suppress responses caused by noise. For this, a threshold has to be applied, selected either by a noise estimation from evaluation of the filter response or manually by the user:
Positive values in now correspond to positive edges, negative values to negative edges. The heights of the edges can be determined from the original signal or from the peaks in . An illustration of the method can be found in Figure 5.3.
Once the steps in the signal are obtained, they can be binned in step height to assign them to individual defects. If all steps in a bin are from one defect, the step height is simply the mean over all contributors. The charge capture time can be obtained by calculating the average time between positive and successive negative steps and vice versa for the charge emission time. If different defects with similar step heights show in the measurement data, extracting them is—in most cases1—not possible using this method. If the defects need to be characterized, extraction may be possible using a hidden Markov model (HMM) based approach.
1 Extraction may be possible if the defects show correlated behavior, in which case a state machine may be used to assign the steps to defects, see [BSC5].
To give an idea of how the BCSUM and the Canny algorithm perform when confronted with measurement data affected by noise, the algorithms are evaluated considering three test cases. The first test case used original data recorded in a low noise environment while the second and third test cases use the same data but with and of added Gaussian noise. The test cases are shown in Figure 5.4. The top plot is always the data to be analyzed and the two plots below show the steps extracted using the Canny and the BCSUM algorithm. The small plots to the right are histograms for the dwelling times of the smaller defect. As can be seen, the accuracy for both methods considerably decreases as the noise level approaches the step height of the defects. For the results published in [BSC5], [BSJ2] (shown in Section 6.1), and [BSJ6] (shown in Section 6.3) the author has used the Canny algorithm to evaluate the RTN data.
As stated earlier, defects are commonly described mathematically using Markov models. In the simple case of a two-state defect and negligible measurement noise, each state in the model can be associated with a unique level of the channel current. The effect of the defect being in the charged state on the current can be measured, simply by measuring the height of the observed steps. Neutral defects are assumed to have no influence on the channel current. At any given gate bias, the capture and emission times, i.e. the charge transition times between the charged and neutral state can be measured by averaging the dwelling times in the high and low current states. This gives the transmission rates between the two states of the Markov model. Thus, the sequence of states of the Markov process is fully observable2 from the measurement and the parameters of the Markov model can be found.
For multi-state defect models, such as the 4-state model, this is generally not true. Since multiple states of the defect will have the same effect on the channel current, the states of the defect can not be known from the measurement—they are hidden. Even for the two-state model, when considering real data with measurement noise, associating a point in the measurement to a defect state is not possible.
Assuming the measurement noise as Gaussian, this scenario can be described using a hidden Markov model with Gaussian observations. Such a model is defined by the following parameters:
• The state vector of the Markov model of length
• The matrix of transition probabilities
• The vector of mean observations for each state of length
• The vector of variances of observation for each state of length
• The sequence of states of length T
• The sequence of observations of length T—i.e. the measured data
Given the measured data and the underlying Markov chain, the hidden Markov model can be trained in order to provide the parameters of the model and the probabilities of measuring a certain or for each state3. This can be done by using the Baum-Welch algorithm , which finds the local maximum-likelihood solution of the problem. The Baum-Welch algorithm is an expectation-maximization algorithm using the forward-backward algorithm. Apart from defect characterization, this method is used for DNA sequencing, speech and text recognition. The algorithm iterates between an expectation step, where statistics are collected from the current model parameters and the observations, and a maximization step where the model parameters are improved using those statistics.
• The transition probability matrix is discretized in time to yield the state transition probability distribution . For equal time steps this is a matrix with elements:
• From the vectors of the mean and variance values for the Gaussian distribution assigned to each state, and the observations, the observation probability distribution is calculated. These are the probabilities of the observations being caused by each of the model states:
As we assume Gaussian noise on our observations, they are calculated from the respective PDFs with means and variances .
• The initial state distribution gives the initial probabilities per state
is commonly initialized to equal numbers, meaning that there is no preferred initial state for the model.
Expectation step: The expectation step is performed using the forward-backward algorithm :
• In the forward step, vectors with the probabilities of arriving in state at a certain time, given the observations and current parameters are obtained:
The values are recursively calculated from the start of the sequence:
• In the backward step, vectors with the probabilities of starting from a state at a certain time, given again the observations and parameters are obtained:
The values are recursively calculated from the end of the sequence:
• The probability of being in state at time for the model and the observation is defined as
This is calculated from the product of forward and backward probabilities to and from each state:
• The probabilities of transitions between all states at all times are defined as
which are calculated from the forward probability to the first state, the backward probability from the second state, and —the probability of the transition and observation after the transition:
• From this, the new parameters of the model can be calculated using the Baum-Welch reestimation equations:
For Gaussian observations the observation probabilities are not reestimated directly, but instead and are calculated from the weighted observation sequence for each state:
Afterwards, is recalculated as in Equation (5.14).
End: The algorithm always converges to a local maximum in the likelihood, whose absolute or relative change between iterations is typically used as the convergence criterion. Once the algorithm converged, the transition rates of the defect can be recovered from . The method may be used to fit multiple defects by combining them into a single Markov chain. In this case, , , and need to be calculated in a way to preserve the independence of the defects, see .
As the computational cost of the calculation increases exponentially with the number of states, a better approach might be to use a factorial HMM (FHMM)  to extract the defect parameters, as proposed by Puglisi et al. . Implementations of HMMs are readily available in many programming languages, e.g., for python [128, 129].
Similar to the histogram and time-lag methods, HMMs depend on the absolute values of the drain current. Such approaches are always affected by slow drift or low frequency noise of the measurement data. To mitigate this, the iterative solver can be extended by a method for baseline correction. Suitable methods include spline smoothing, asymmetric least squares , and local regression . HMMs have been used by the author in [BSJ6] (shown in Section 6.3) to characterize a multi-state defect which produced aRTN and in [BSJ5] to investigate correlated RTN signals.
2 Observable in the context of HMMs refers to being able to measure the state the model is in, this is different from observable in the context of control theory.
3 This is called emission probability in the context of hidden Markov models—not to be confused with the charge emission probability of a defect
Once the charge capture and emission times have been extracted at a number of gate voltages, the vertical position as well as the energy of a defect can be calculated. For this, the gate bias at the intersection point where and the slope needs to be calculated from the measurement data.
We start from general rates for capture and emission, which can be given analytically considering a two-state model by:
Here, are prefactors and are energy barriers for capture and emission. Taking the logarithm and the first derivative after yields
For this estimation, the bias dependence of the prefactors is neglected. The subtraction of the equations for capture and emission rates yields
with replaced by the change in defect level . Assuming inversion and approximating the oxide free of charge, any change in gate bias should influence the defect level proportional to the relative depth of the defect in the oxide:
Now we can express the relative depth of the defect using the steepness of the transition rates or time constants:
Integration of Equation (5.33) further yields the trap energy
Finally, the integration constant can be found by evaluating Equation (5.35) at the voltage of intersection of the charge capture and emission time characteristics, i.e. , where . This allows to calculate the defect energy at zero field , i.e. at :
with the surface potential and the flatband voltage . It has to be noted that this estimation has a number of shortcomings:
• It occasionally yields negative distances for defects interacting with the gate. In this case the defect might have been interacting with the gate, in which case the distance and Fermi level should be considered from the gate.
• It neglects the bias dependence of the prefactors. This usually leads to an overestimation of the distances and energies.
• It assumes a single transition barrier, i.e. it does not work for defects which cannot be modeled using a two state model.
Nonetheless, in many cases it allows for a quick and reasonable estimation of defect parameters, which may in turn provide a suitable initial guess for more accurate numerical simulations. The author has employed the method in this fashion in [BSJ6] (shown in Section 6.3) and [BSJ2] (shown in Section 6.1).
As described in Section 4.1.1, RTN caused by a single defect will appear as a Lorentzian in the frequency spectrum. By characterizing the individual Lorentzians of a full PSD, the average transition rates of the defects can be obtained. Variation of the measurement voltage further allows to find the point where the noise energy of a defect is at a maximum, which means that the occupancy of the defect is at 50%. At this point, the step height of the defect can be calculated, which allows calculation of the occupancy at other voltages.
In devices with many active defects, the relation between the noise power calculated from the drain current and the number of defect states close to the Fermi level allows to obtain the defect density over the energy of the band gap . The basic equation which describes this relation is given as 
Here, is the noise spectral density, is the average drain current, is the variation of free-carrier concentration due to carrier trapping, is the density of free channel electrons, gives the influence of the channel mobility on the number of trapped carriers, is the carrier mobility, the density of oxide traps, the Boltzmann constant, the temperature, the channel area, the tunneling attenuation length and the frequency.
For the extraction of TDDS data an edge detection method as outlined in Section 5.1.2 is applied to the recovery data. As the charge emission probabilities are strongly modulated by the applied bias, the dwelling times of the defects at are practically identical to the point in time in the recovery trace where the charge is emitted on average. The extracted steps are plotted in a scatter map with the step height and the emission time as axes as shown in Figure 5.6.
Plotting multiple repetitions of the measurement leads to the formation of clusters in this so-called spectral map . Each of the clusters corresponds to a single defect. The points in the clusters are distributed normally in step height and exponentially in time around the emission time of the defect. By dividing the number of emissions in each cluster by the number of repetitions, the capture probability of the defects can be obtained for the applied stress conditions. The capture time at can then be obtained by plotting the capture probability of the defect over the stress time which is observed to follow an exponential distribution, as expected for a Markov process and shown in Figure 5.7.