(image) (image) [Previous] [Next]

Advanced Electrical Characterization of Charge Trapping in MOS Transistors

Chapter 5 Defect Parameter Extraction from RTN, TDDS, and CV Measurements

The conduction of the experiments at controlled bias conditions with exact timing and low noise measurement of the typically observed small currents is a challenge throughout the entire characterization of a technology. Another key challenge which will be discussed in this chapter is the evaluation of the resulting data set in order to extract parameters for the defects. Furthermore, these parameters should serve as basis for subsequently performed simulations and lifetime estimations and should lead to accurate results. For RTN and TDDS traces, this means analyzing the drain current traces for step heights and transition times of the individual defects at a specific gate bias, whereas for CV measurements, it means finding the defect densities at a specific frequency and gate bias. In the following the typical measurement signals, their analysis and the meaning for the defect parameters and distributions are discussed.

5.1 Random Telegraph and TDDS Signals

RTN and TDDS signals consist of a sequence of (math image) or (math image) values. The data points may be sampled linear in time, i.e. in equidistant time intervals, or non-linear, e.g. with the sampling time increasing logarithmically. Some of the analysis methods presented in this section will work on both types of data, while some will require pre-processing or weighting of the data to yield correct results for non-linear sampled data.

For RTN data, the method of choice depends on the signal-to-noise ratio (SNR) of the recorded data, the parameters to be extracted from the measurements and the amount of data which has to be processed. In contrast to RTN data sets, TDDS signals from stress-recovery measurements have the advantages that they have a defined point where the recovery starts, and thus where the emission time can be measured from, and usually contain only a single emission step per defect. This allows for a much simpler data extraction using an edge-detection based method, as will be discussed at the end of this section.

5.1.1 Histogram and Lag Methods

Two of the most accessible methods for RTN and TDDS defect parameter extraction are the histogram method [117] and the slightly more advanced time-lag method [118]. The idea behind these methods is to analyze the occurrence of (math image) values in the recorded signals.

For the histogram method, as the name suggests, a histogram is created from the recorded data. Given a low enough SNR, the individual levels in the signal appear as peaks in the histogram. The peaks themselves typically have a Gaussian shape and their width is determined by the noise of the signal. If the method is applied to RTN signals, as is the most common use-case, a total of \( 2^N \) peaks will be present for \( N \) active defects in the signal. For the signal shown in Figure 5.1a with \( N=2 \) defects, this relation is visible in Figure 5.1b. The height of the peaks is determined by the ratio of capture and emission times for the defects.

(image)

Figure 5.1: Exemplary RTN signals (a) extracted using histograms (b,c) and time lag plots (d,e). The original RTN signal is shown in blue. The corresponding histogram (b) shows four partially overlapping peaks, indicating two defects. The time lag plot (d) shows the same peaks, but with better separation. In addition, the transitions between the states are shown in the off-diagonals. For the orange signal, a linear drift of 1.5 µV/s was added to show the influence of drift and low-frequency noise. Due to the drift, both the corresponding histogram (c) and time lag plot (e) show deformed peaks and it becomes difficult to distinguish the small defect. Originally published in [BSJ1]

The more advanced time-lag method works by creating a two-dimensional histogram or scatter plot of each two consecutive (math image) points. This results in a better separation between the peaks, as the distance between the peaks is increased by a factor of \( \sqrt {2} \) and points containing intermediate values of (math image) because a switch occurred during the sampling period are located on the off-diagonals. This is illustrated in Figure 5.1d. Four separate clusters are visible on the diagonal, and the most common transitions occur between the two lower, and between the two upper clusters. From this it follows that the average time constant of the smaller defect is much shorter than the one from the larger defect.

From the positions of the peaks in the histogram and the time-lag plots, the step heights of the defects visible in the signal can be extracted. If the peaks can be clearly separated a combination of defect states can be assigned to each peak, and the corresponding dwelling times can be extracted. If this is not possible, defect occupancies can be estimated by measuring the approximate areas of each peak.

One of the main drawbacks of these methods is their reliance on absolute values of the signal for obtaining defect states. This is shown for the orange signal in Figure 5.1a, c, and e. A small drift of 1.5 µV/s was added to the 100 s long signal. For the noise in the measurement, this is enough to obfuscate the number of defects in both the histogram and time-lag plots. Another drawback is the relatively high SNR required for these methods to work. This can be mitigated using a recent improvement by Martin-Martinez et al., the weighted time-lag method [119]. Despite their drawbacks, these methods are commonly used for extraction of parameters from RTN signals and fast visual inspection of RTN data.

5.1.2 Edge Detection

The main goal of RTN and TDDS data analysis is often to obtain the positions and amplitudes of the discrete steps in the signal. Thus, a natural approach is to use an edge detection algorithm to extract the steps from the data. Compared to other methods, the advantage of this approach is that edge detection is less susceptible to drift or low frequency noise present in the measurement data. In the following, two edge detection methods—the BCSUM algorithm and the Canny algorithm—will be outlined and benchmarked against each other.

The BCSUM Algorithm

The bootstrapping and cumulative sum (BCSUM) algorithm [120, 121] is a recursive algorithm which can be used to detect changes in the mean of a signal. It can be used on both uniform or non-uniformly sampled data with varying noise power. The general idea is to describe the measurement as two distinct constant signals separated by a step of height \( d \) at time \( \tau \). This is described by the mean shift model:

(5.1) \{begin}{`} r(t) = \mu _0 + d \sigma (t - \tau ). \{end}{`}

Here, \( \mu _0 \) is the mean of the first signal, \( d \) the step height, \( \sigma \) the step function and \( \tau \) the change point. The discrete measurement signal is described by a vector \( \vec {r} = [r_0, r_1, ..., r_{N-1}] \) of length \( N \). Assuming that the samples in the signal are normally distributed around the mean value in each of the two partial signals, the distribution for the two parts are given by \( p_0(r) \) and \( p_1(r) \) with means \( \mu _0 \) and \( \mu _1 = \mu _0 + d \) and variances \( \sigma _0 = \sigma _1 = \sigma \). With this, a log-likelihood ratio can be defined which changes its sign depending on which part of the signal the sample more likely belongs to:

(5.2) \{begin}{`} s_i = \ln \frac {p_1(r_i)}{p_0(r_i)}. \{end}{`}

The cumulative log-likelihood for any slice of a signal is defined as

(5.3) \{begin}{`} S_m^n = \sum _{i = m}^n s_i. \{end}{`}

With the assumed Gaussian distributions of the partial signals

(5.4) \{begin}{`} p(r_i) = \frac {1}{\sigma \sqrt {2\pi }} \exp \left (-\frac {(r_i-\mu )^2}{2\sigma ^2} \right ), \{end}{`}

the log-likelihoods ratio can be written as

(5.5) \{begin}{`} s_i = d \frac {r_i - \mu _0 - d/2}{\sigma ^2}, \{end}{`}

with an expectation values of \( s_i = \mp d^2/(2\sigma ^2) \) before and after the step respectively. The change point index is thus found as

(5.6) \{begin}{`} i\sub {c} = \{k: \operatorname {min}|_k S_0^k\}. \{end}{`}

As the underlying statistical distributions of the signals are unknown a priori, bootstrapping—a statistical method based on random sampling with repetition—is used for parameter estimation. For this, a cumulative sum \( \vec {C} \) is defined in lieu of \( S \):

(5.7) \{begin}{`} C_n^* = \sum _{i=0}^n (r_i^* - \nu ) \text { with } \nu =\frac {1}{N}\sum _N r_i^* , \{end}{`}

where \( \vec {r^*} = [r_0^*, r_1^*, ..., r_{N-1}^*] \) is a signal randomly sampled from the original signal. From \( \vec {C^*} \),

(5.8) \{begin}{`} \gamma ^* = \operatorname {max}\vec {C^*} - \operatorname {min}\vec {C^*} \{end}{`}

is obtained for a number \( B \) of bootstraps. A sensitivity parameter \( \epsilon \in [0,1] \) is further defined for the algorithm. If then \( \gamma \) from the original signal is above the \( \epsilon \) quantile of the distribution of \( \gamma ^* \), this is taken as indication that a changepoint occurred in the signal at \( \operatorname {max}|\vec {C}| \) of the original signal. In this case the algorithm is recursively applied on the two split parts of the signal.

An example is given in Figure 5.2. The BCSUM algorithm is applied to on the signal on the left. A step is found at \( i\approx 50 \), and the signal is split into the parts shown in the center and on the right. For the part shown in the center, no step is detected and the algorithm stops. For the part shown on the right on the other hand, another step is found and the algorithm continues by analyzing the two splits of this signal.

(image)

Figure 5.2: The BCSUM algorithm applied to an exemplary signal \( r \). The columns show the initial step (left), and the first recursion for the first part (center) and second part (right) of the signal. Top: Signal used during recursion. Middle: CSUM signal of the original data. Bottom: Histogram of \( \gamma ^* \) values obtained from the bootstrapped data, together with the \( \epsilon = 0.9 \) quantile value (Black), and \( \gamma \) from the original data (Red). Recursion ends when \( \gamma < \gamma ^*_{\epsilon } \), as is the case in the bottom center plot.

The Canny Algorithm

The Canny filter is an algorithm commonly used in computer vision to detect edges in recorded images, published in 1987 [122]. Unlike 2D images analyzed in computer vision, or even 3D images analyzed in medical applications, RTN data are 1D signals which considerably simplifies the filter. In his work, Canny finds the optimal filter for step-edges based on three criteria. The criteria he defines for the filter are a low probability of error—equivalent to the SNR of the filter, a good localization of the edge position and the suppression of multiple detections per edge. Based on this, he found that there is a trade-off between the SNR and the localization, with the SNR increasing proportional to the square root of the filter width and the localization increasing inversely proportional to it. He further showed that the optimal filter kernel can be closely represented by the derivative of a Gaussian, which leads to improved performance. The first derivative of a Gaussian is given by

(5.9) \{begin}{`} G’(t) = A\frac {\intd {}}{\intd {t}}\exp {\left (-\frac {t^2}{2\sigma ^2}\right )} = -A\frac {t}{\sigma ^2}\exp {\left (-\frac {t^2}{2\sigma ^2}\right )}, \{end}{`}

with some normalization constant A. The filter response is the convolution of the kernel—truncated to a width \( W \) and discretized as \( G^{\prime *}(n) \)—and the signal \( f(n) \):

(5.10) \{begin}{`} H(n) = \sum _{i=-W}^{W} G^{\prime *}(i) f(n-i). \{end}{`}

In the case of a 1D signal, non-maximum suppression, i.e. finding only the strongest edges in the signal, can be performed simply by finding the local extrema in the response:

(5.11) \{begin}{`} H\sub {m}(n) = \begin {cases} H(n) & (|H(n)| > H(n+1)) \wedge (|H(n)| > H(n-1)) \\ 0 & \, \text {else.} \end {cases} \{end}{`}

Finally, thresholding has to be applied to suppress responses caused by noise. For this, a threshold \( H\sub {th} \) has to be applied, selected either by a noise estimation from evaluation of the filter response or manually by the user:

(5.12) \{begin}{`} H\sub {t}(n) = \begin {cases} H(n) & |H\sub {m}(n)| > H\sub {th} \\ 0 & \, \text {else.} \end {cases} \{end}{`}

Positive values in \( H_t(n) \) now correspond to positive edges, negative values to negative edges. The heights of the edges can be determined from the original signal or from the peaks in \( H(n) \). An illustration of the method can be found in Figure 5.3.

(image)

Figure 5.3: The Canny algorithm for edge detection demonstrated for an exemplary measurement trace. The input signal \( f(t) \) is convoluted with the first derivative of a Gaussian pulse with a chosen variance \( G^{\prime } \). This gives a signal \( H(t) \), which exhibits peaks corresponding to the steps in the original signal. All peaks above a chosen threshold are recognized and used to mark the positions of the steps in \( f(t) \). The height of the peaks can be obtained from the original signal or from the height of the peaks. Modified from [BSJ1]

Step Heights and Time Constants

Once the steps in the signal are obtained, they can be binned in step height to assign them to individual defects. If all steps in a bin are from one defect, the step height is simply the mean over all contributors. The charge capture time can be obtained by calculating the average time between positive and successive negative steps and vice versa for the charge emission time. If different defects with similar step heights show in the measurement data, extracting them is—in most cases1—not possible using this method. If the defects need to be characterized, extraction may be possible using a hidden Markov model (HMM) based approach.

1 Extraction may be possible if the defects show correlated behavior, in which case a state machine may be used to assign the steps to defects, see [BSC5].

Comparison of Methods

To give an idea of how the BCSUM and the Canny algorithm perform when confronted with measurement data affected by noise, the algorithms are evaluated considering three test cases. The first test case used original data recorded in a low noise environment while the second and third test cases use the same data but with \( \sigma =\SI {0.1}{mV} \) and \( \sigma =\SI {0.2}{mV} \) of added Gaussian noise. The test cases are shown in Figure 5.4. The top plot is always the data to be analyzed and the two plots below show the steps extracted using the Canny and the BCSUM algorithm. The small plots to the right are histograms for the dwelling times of the smaller defect. As can be seen, the accuracy for both methods considerably decreases as the noise level approaches the step height of the defects. For the results published in [BSC5], [BSJ2] (shown in Section 6.1), and [BSJ6] (shown in Section 6.3) the author has used the Canny algorithm to evaluate the RTN data.

(image) (image) (image)

Figure 5.4: Comparison of the Canny edge detector and the BCSUM method for signals with different noise floors. The signal contains two defects with step heights of \( \eta _1\approx \SI {0.25}{mV} \) and \( \eta _2\approx \SI {0.55}{mV} \). (top) Original signal, both methods agree within 2% of absolute deviation of the extracted charge transition times. (middle) \( \sigma =\SI {0.1}{mV} \) of added noise. The extracted capture time increases as the algorithms miss shorter transitions. (bottom) \( \sigma =\SI {0.2}{mV} \) of added noise. With a noise level similar to the step height of the extracted defect, spurious transitions affect the detections while proper steps are missed, leading to false results.

5.1.3 Hidden Markov Models

(image)

Figure 5.5: The measured drain current can be described by the evolution of the underlying Markov chain(s) of the defects. The reverse, however, may not be true. If reconstruction of the individual defect states is not possible from the recorded drain current data, a hidden Markov model in conjunction with the Baum-Welch algorithm may be used to estimate the model parameters, i.e. transistion probabilities \( \mathbf {P} \) and emissions \( \vec {\mu } \).

As stated earlier, defects are commonly described mathematically using Markov models. In the simple case of a two-state defect and negligible measurement noise, each state in the model can be associated with a unique level of the channel current. The effect of the defect being in the charged state on the current can be measured, simply by measuring the height of the observed steps. Neutral defects are assumed to have no influence on the channel current. At any given gate bias, the capture and emission times, i.e. the charge transition times between the charged and neutral state can be measured by averaging the dwelling times in the high and low current states. This gives the transmission rates between the two states of the Markov model. Thus, the sequence of states of the Markov process is fully observable2 from the measurement and the parameters of the Markov model can be found.

For multi-state defect models, such as the 4-state model, this is generally not true. Since multiple states of the defect will have the same effect on the channel current, the states of the defect can not be known from the measurement—they are hidden. Even for the two-state model, when considering real data with measurement noise, associating a point in the measurement to a defect state is not possible.

Assuming the measurement noise as Gaussian, this scenario can be described using a hidden Markov model with Gaussian observations. Such a model is defined by the following parameters:

  • • The state vector of the Markov model \( \vec {I} \) of length \( N \)

  • • The \( N\times N \) matrix of transition probabilities \( \mathbf {P} \)

  • • The vector of mean observations for each state \( \vec {\mu } \) of length \( N \)

  • • The vector of variances of observation for each state \( \vec {\sigma ^2} \) of length \( N \)

  • • The sequence of states \( \vec {x} \) of length T

  • • The sequence of observations \( \vec {y} \) of length T—i.e. the measured data

Given the measured data and the underlying Markov chain, the hidden Markov model can be trained in order to provide the parameters of the model and the probabilities of measuring a certain (math image) or (math image) for each state3. This can be done by using the Baum-Welch algorithm [123], which finds the local maximum-likelihood solution of the problem. The Baum-Welch algorithm is an expectation-maximization algorithm using the forward-backward algorithm. Apart from defect characterization, this method is used for DNA sequencing, speech and text recognition. The algorithm iterates between an expectation step, where statistics are collected from the current model parameters and the observations, and a maximization step where the model parameters are improved using those statistics.

Initialization: The parameters \( \vec {\mu } \), \( \vec {\sigma ^2} \) and \( \mathbf {P} \) are initialized, either using estimates or by using random values within certain bounds. Prior to calculation the following quantities are introduced:

  • • The transition probability matrix \( \textbf {P} \) is discretized in time to yield the state transition probability distribution \( \mathbf {A} \). For equal time steps this is a \( N\times N \) matrix with elements:

    (5.13) \{begin}{`}     a_{ij} &= \mathcal {P}(x_{t+1}=j| x_t=i). \{end}{`}

  • • From the vectors of the mean and variance values for the Gaussian distribution assigned to each state, and the observations, the observation probability distribution \( \mathbf {B} \) is calculated. These are the probabilities of the observations being caused by each of the model states:

    (5.14) \{begin}{`}    \label {eqn:ex:hmm:B} b_{it} &= \mathcal {P}(y_t|x_t=i). \{end}{`}

    As we assume Gaussian noise on our observations, they are calculated from the respective PDFs with means \( \mu _i \) and variances \( \sigma ^2_i \).

  • • The initial state distribution \( \vec {\pi } \) gives the initial probabilities per state

    (5.15) \{begin}{`}    \pi _i &= \mathcal {P}(x_1=i). \{end}{`}

    \( \vec {\pi } \) is commonly initialized to equal numbers, meaning that there is no preferred initial state for the model.

Expectation step: The expectation step is performed using the forward-backward algorithm [124]:

  • • In the forward step, vectors \( \vec {\alpha _i} \) with the probabilities of arriving in state \( i \) at a certain time, given the observations and current parameters are obtained:

    (5.16) \{begin}{`}    \alpha _{i,t} = \mathcal {P}(y_1, y_2, ... , y_t, x_t = i| \mathbf {A},\mathbf {B},\vec {\pi } ). \{end}{`}

    The values \( \alpha _{i} \) are recursively calculated from the start of the sequence:

    (5.17–5.18) \{begin}{`}      \alpha _{i,1} &= \pi _i b_{i,1}\\ \alpha _{i,t+1} &= b_{i,t+1} \sum _j \alpha _{jt} a_{ji}. \{end}{`}

  • • In the backward step, vectors \( \vec {\beta _i} \) with the probabilities of starting from a state \( i \) at a certain time, given again the observations and parameters are obtained:

    (5.19) \{begin}{`}    \beta _it = \mathcal {P}(y_{t+1}, y_{t+2}, ... , y_T, x_t = i| \mathbf {A},\mathbf {B},\vec {\pi } ). \{end}{`}

    The values \( \beta _{i} \) are recursively calculated from the end of the sequence:

    (5.20–5.21) \{begin}{`}   \beta _{i,T} &= 1\\ \beta _{i,t} &= \sum _j \beta _{j,t+1} a_{i,j} b_{j,t+1}. \{end}{`}

Maximization step: To find the model parameters that maximize the probability of observation:

  • • The probability of being in state \( i \) at time \( t \) for the model and the observation is defined as

    (5.22) \{begin}{`}     \gamma _{i,t} = \mathcal {P}(x_t=i|\vec {y},\mathbf {A},\mathbf {B},\vec {\pi }). \{end}{`}

    This is calculated from the product of forward and backward probabilities to and from each state:

    (5.23) \{begin}{`}       \gamma _{i,t} &= \frac {\alpha _{i,t}\beta _{i,t}}{\sum _j \alpha _{j,t}\beta _{j,t}}. \{end}{`}

  • • The probabilities of transitions between all states at all times are defined as

    (5.24) \{begin}{`}     \xi _{i,j,t} = \mathcal {P}(x_t=i,x_{t+1}=j|\vec {y},\mathbf {A},\mathbf {B},\vec {\pi }), \{end}{`}

    which are calculated from the forward probability to the first state, the backward probability from the second state, and \( a_{i,j}b_{j,t+1} \)—the probability of the transition and observation after the transition:

    (5.25) \{begin}{`}    \xi _{i,j,t} &= \frac {\alpha _{i,t} a_{i,j} b_{j,t+1} \beta _{j,t+1}} {\sum _k \sum _l \alpha _{k,t} a_{k,l} b_{l,t+1} \beta _{l,t+1}}. \{end}{`}

  • • From this, the new parameters of the model can be calculated using the Baum-Welch reestimation equations:

    (5.26–5.27) \{begin}{`}     \pi _i^* &= \gamma _{i,1} \\ a_{i,j}^* &= \frac {\sum _t \xi _{i,j,t}}{\sum _{t} \gamma _{i,t}}. \{end}{`}

    For Gaussian observations the observation probabilities \( \mathbf {B} \) are not reestimated directly, but instead \( \vec {\mu } \) and \( \vec {\sigma ^2} \) are calculated from the weighted observation sequence for each state:

    (5.28–5.29) \{begin}{`}   \mu _i^* &= \frac {\sum _t \gamma _{i,t} y_t}{\sum _{t} \gamma _{i,t}}\\ \sigma ^{2*}_i &= \frac {\sum _t \gamma _{i,t} (y_t-\mu _{i,t})^2}{\sum _{t}
\gamma _{i,t}}. \{end}{`}

    Afterwards, \( \mathbf {B^*} \) is recalculated as in Equation (5.14).

End: The algorithm always converges to a local maximum in the likelihood, whose absolute or relative change between iterations is typically used as the convergence criterion. Once the algorithm converged, the transition rates of the defect \( \mathbf {P} \) can be recovered from \( \mathbf {A} \). The method may be used to fit multiple defects by combining them into a single Markov chain. In this case, \( \mathbf {A^*} \), \( \vec {\mu }^* \), and \( \vec {\sigma ^2}^* \) need to be calculated in a way to preserve the independence of the defects, see [125].

As the computational cost of the calculation increases exponentially with the number of states, a better approach might be to use a factorial HMM (FHMM) [126] to extract the defect parameters, as proposed by Puglisi et al. [127]. Implementations of HMMs are readily available in many programming languages, e.g., for python [128, 129].

Similar to the histogram and time-lag methods, HMMs depend on the absolute values of the drain current. Such approaches are always affected by slow drift or low frequency noise of the measurement data. To mitigate this, the iterative solver can be extended by a method for baseline correction. Suitable methods include spline smoothing, asymmetric least squares [130], and local regression [131]. HMMs have been used by the author in [BSJ6] (shown in Section 6.3) to characterize a multi-state defect which produced aRTN and in [BSJ5] to investigate correlated RTN signals.

2 Observable in the context of HMMs refers to being able to measure the state the model is in, this is different from observable in the context of control theory.

3 This is called emission probability in the context of hidden Markov models—not to be confused with the charge emission probability of a defect

5.1.4 Estimations for Trap Level and Position

Once the charge capture and emission times have been extracted at a number of gate voltages, the vertical position as well as the energy of a defect can be calculated. For this, the gate bias at the intersection point where \( \tauc = \taue \) and the slope \( \partial \ln (\tauc /\taue )/\partial \vg [] \) needs to be calculated from the measurement data.

We start from general rates for capture and emission, which can be given analytically considering a two-state model by:

(5.30) \{begin}{`} k\sub {c,e} = k\sub {0c,e} \exp {\left ( -\frac {\mathcal {E}\sub {c,e}}{k\sub {B}T}\right )}. \label {equ:approx:1} \{end}{`}

Here, \( k\sub {0c,e} \) are prefactors and \( \mathcal {E}\sub {c,e} \) are energy barriers for capture and emission. Taking the logarithm and the first derivative after (math image) yields

(5.31) \{begin}{`} \frac {\partial \ln {k\sub {c,e}}}{\partial \vg } = \frac {\partial \ln {k\sub {0c,e}}}{\partial \vg } - \frac {1}{k\sub {B}T} \frac {\partial \mathcal {E}\sub
{c,e}}{\partial \vg }. \label {eq:ep:3} \{end}{`}

For this estimation, the bias dependence of the prefactors is neglected. The subtraction of the equations for capture and emission rates yields

(5.32) \{begin}{`} \frac {\partial \ln {k\sub {c}/k\sub {e}}}{\partial \vg } = - \frac {1}{k\sub {B}T} \frac {\partial (\mathcal {E}\sub {c}-\mathcal {E}\sub {e})}{\partial \vg } = -
\frac {1}{k\sub {B}T} \frac {\partial \Et }{\partial \vg }, \{end}{`}

with \( \partial (\mathcal {E}\sub {c}-\mathcal {E}\sub {e}) \) replaced by the change in defect level \( \partial \Et \). Assuming inversion and approximating the oxide free of charge, any change in gate bias should influence the defect level proportional to the relative depth of the defect in the oxide:

(5.33) \{begin}{`} \frac {\partial \Et }{\partial V\sub {G}} = -q\frac {d}{t\sub {ox}}. \label {equ:approx:d} \{end}{`}

Now we can express the relative depth of the defect using the steepness of the transition rates or time constants:

(5.34) \{begin}{`} \frac {d}{t\sub {ox}} = -\frac {k\sub {B}T}{q} \left ( \frac {\partial \ln \tau \sub {c}/\tau \sub {e}}{\partial \vg }\right ). \{end}{`}

Integration of Equation (5.33) further yields the trap energy

(5.35) \{begin}{`} \Et = -q\frac {d}{t\sub {ox}} V\sub {G} + C. \label {equ:approx:7} \{end}{`}

Finally, the integration constant can be found by evaluating Equation (5.35) at the voltage \( \vg [,i] \) of intersection of the charge capture and emission time characteristics, i.e. \( \tauc (\vg [,i])_=\taue (\vg [,i]) \), where \( \Et =\Ef \). This allows to calculate the defect energy at zero field \( \Et [,0] \), i.e. at \( \vg =-V\sub {FB}-\phi \sub {s} \) [132]:

(5.36) \{begin}{`} \Et [,0] = \Ef [,i] +q\frac {d}{t\sub {ox}} (V\sub {G,i}-\phi \sub {s}-V\sub {FB}), \label {equ:approx:et} \{end}{`}

with the surface potential \( \phi \sub {s} \) and the flatband voltage \( V\sub {FB} \). It has to be noted that this estimation has a number of shortcomings:

  • • It occasionally yields negative distances for defects interacting with the gate. In this case the defect might have been interacting with the gate, in which case the distance and Fermi level should be considered from the gate.

  • • It neglects the bias dependence of the prefactors. This usually leads to an overestimation of the distances and energies.

  • • It assumes a single transition barrier, i.e. it does not work for defects which cannot be modeled using a two state model.

Nonetheless, in many cases it allows for a quick and reasonable estimation of defect parameters, which may in turn provide a suitable initial guess for more accurate numerical simulations. The author has employed the method in this fashion in [BSJ6] (shown in Section 6.3) and [BSJ2] (shown in Section 6.1).

5.1.5 Frequency Domain Methods

As described in Section 4.1.1, RTN caused by a single defect will appear as a Lorentzian in the frequency spectrum. By characterizing the individual Lorentzians of a full PSD, the average transition rates of the defects can be obtained. Variation of the measurement voltage further allows to find the point where the noise energy of a defect is at a maximum, which means that the occupancy of the defect is at 50%. At this point, the step height of the defect can be calculated, which allows calculation of the occupancy at other voltages.

In devices with many active defects, the relation between the noise power calculated from the drain current and the number of defect states close to the Fermi level allows to obtain the defect density over the energy of the band gap [133]. The basic equation which describes this relation is given as [134]

(5.37) \{begin}{`} \frac {S_{\id }(f)}{\id ^2} &= \left [ \frac {R}{N\sub {s}} + \alpha \mu \right ]^2 \cdot \frac {N\sub {ot}(\Ef )\kB T}{WL\gamma } \cdot \frac {1}{f}.
\{end}{`}

Here, \( S_{\id }(f) \) is the noise spectral density, \( \id \) is the average drain current, \( R \) is the variation of free-carrier concentration due to carrier trapping, \( N\sub {s} \) is the density of free channel electrons, \( \alpha \) gives the influence of the channel mobility on the number of trapped carriers, \( \mu \) is the carrier mobility, \( N\sub {ot} \) the density of oxide traps, (math image) the Boltzmann constant, \( T \) the temperature, \( WL \) the channel area, \( \gamma \) the tunneling attenuation length and \( f \) the frequency.

5.1.6 Time-Dependent Defect Spectroscopy

For the extraction of TDDS data an edge detection method as outlined in Section 5.1.2 is applied to the recovery data. As the charge emission probabilities are strongly modulated by the applied bias, the dwelling times of the defects at (math image) are practically identical to the point in time in the recovery trace where the charge is emitted on average. The extracted steps are plotted in a scatter map with the step height and the emission time as axes as shown in Figure 5.6.

(image)

Figure 5.6: Defects found from TDDS measurements. Recovery traces (top) are analyzed using a step detection algorithm and the steps are plotted in the step height–emission time plane (bottom). If enough traces are plotted, clusters will form for each defect which emitted during recovery. The clusters are distributed exponentially in time and normally in step height. To obtain the capture time of the defects, measurements at varying stress times have to be performed. From [135]

Plotting multiple repetitions of the measurement leads to the formation of clusters in this so-called spectral map [135]. Each of the clusters corresponds to a single defect. The points in the clusters are distributed normally in step height and exponentially in time around the emission time of the defect. By dividing the number of emissions in each cluster by the number of repetitions, the capture probability of the defects can be obtained for the applied stress conditions. The capture time at (math image) can then be obtained by plotting the capture probability of the defect over the stress time which is observed to follow an exponential distribution, as expected for a Markov process and shown in Figure 5.7.

(image)

Figure 5.7: Illustration of charge capture time extraction from a TDDS measurement. Assuming a low occupancy of the defect at recovery conditions and a sufficient relaxation time, the average occupation of the defect at the end of the stress phase equals the probability of observing the defect discharging during recovery. By plotting this occupancy, i.e. the ratio of the number of emission events over the number of measurement repetitions, over the stress time, the exponential cumulative density function (CDF) for charge capture can be obtained. The charge capture time of the defect at the stress condition can then be extracted by fitting the theoretical distribution.