Recent trends in bias temperature instability

B. Kaczer
imec, Kapeldreef 75, B-3001 Leuven, Belgium

T. Grasser
Institute for Microelectronics, TU Wien, Austria

J. Franco
imec and ESAT Department, KU Leuven, Belgium

M. Toledano-Luque
imec and Dpto. Física Aplicada III, Universidad Complutense de Madrid, Spain

Ph. J. Roussel, M. Cho, and E. Simoen
imec, Kapeldreef 75, B-3001 Leuven, Belgium

G. Groeseneken
imec and ESAT Department, KU Leuven, Belgium

(Received 10 August 2010; accepted 25 October 2010; published 6 January 2011)

Several trends occurring in the past few years in our understanding of bias temperature instability (BTI) are reviewed. Among the most important is the shift toward analyzing BTI relaxation with the tools originally developed for describing low-frequency noise. This includes the interpretation of the time, temperature, voltage, and duty cycle dependences. It is shown that a wealth of information about gate oxide defect properties can be obtained from deeply scaled devices and correctly modeled based on nonradiative multiphonon theory. It is then shown how detailed understanding of individual defect properties can allow interpreting the variability issues of future complementary metal-oxide semiconductor technologies. This is complemented by showing the most promising technological solutions for BTI. © 2011 American Vacuum Society. [DOI: 10.1116/1.3521505]

I. INTRODUCTION

Among the critical reliability issues facing present and future deeply downscaled complementary metal-oxide semiconductor (CMOS) devices is the so-called bias temperature instability (BTI). While BTI in n-channel field effect transistor (nFET) devices was generally ascribed to charge trapping in (the high-k portion of) the gate oxide, the interpretation of BTI in p-channel FET (pFET) devices still generates controversy.1–3 Although in the past often relegated to a secondary issue, we have long argued that central to the correct understanding of BTI is the interpretation of the so-called BTI recovery.4–7 This phenomenon in pFET devices has been previously described by backdiffusion of hydrogen into substrate/gate oxide interface states.8 The so-called reaction-diffusion model based on this assumption is still popular, especially in the design community,9 despite being inconsistent with some crucial observations.10

One of the most intriguing properties of BTI relaxation is the lack of its characteristic time scale, especially in pFETs, suggesting some type of dispersion in the underlying mechanism.5,7,11 In CMOS technologies, a response on many time scales is typical for low-frequency noise [and its manifestation as random telegraph noise (RTN) in deeply scaled devices], suggesting that BTI recovery is, in fact, caused by the same defects.12,13 This connection is crucial for most arguments presented in this article.14

II. BRIEF OVERVIEW OF BTI

BTI is a consequence of charging of defect states in the gate oxide and at its interface.2 The defects could be both pre-existing and generated during device operation. The trapped charge results in a shift of the device parameters, such as its threshold voltage $V_{th}$, channel mobility, transconductance, and subthreshold slope, and generally a decrease of the FET’s drive current. The name is derived from the phenomenon being strongly accelerated by temperature $T$ and gate bias $V_G$. BTI in n-channel FET devices, which are typically biased in circuits at positive $V_G$, is referred to as positive BTI (PBTI), while negative BTI (NBTI) takes place in p-channel FETs. Constant $V_G$ stress bias is often referred to as static or “DC” BTI, while periodically interrupted $V_G$ stress is called “AC” or dynamic BTI.

---

4Electronic mail: kaczer@imec.be
III. STATIC BTI

Figure 1 illustrates the typical gradual shift of pFET threshold voltage $\Delta V_{th}$ during accelerated stress at elevated $T$. The stress data are typically measured at several $V_G$’s and $\Delta V_{th}$ is extrapolated to 10 years at the circuit operating voltage $V_{DD}$ (or $V_{DD}$+10%). The extrapolated $\Delta V_{th}$ must be below a given value (typically 30 or 50 mV) for the technology to qualify.

This simple extrapolation procedure is, however, complicated by $\Delta V_{th}$ decreasing immediately after the stress bias is removed, as illustrated in Fig. 1. As we will discuss henceforth, this recovery, or relaxation, component $R$ typically proceeds simultaneously on many time scales, making it difficult to determine its beginning or end and thus separating it from the final nonrecoverable, or permanent, component $P$. This $\Delta V_{th}$ relaxation is thus a crucial problem for BTI measurement, interpretation, and extrapolation. Understanding the recoverable component is central to unraveling the BTI mechanism.

IV. DYNAMIC BTI

In many CMOS applications, such as logic, the majority of the FETs are constantly switched and thus exposed to dynamic stress. Figure 2 documents that NBTI is present at frequencies up to the gigahertz range, i.e., there does not appear to be any “cut-off” time constant of the degradation mechanism above $\sim$1 ns. Furthermore, the AC bias signal reduces BTI with respect to the DC stress. This provides some additional reliability margin, which can be factored in during the application design phase.

In an arbitrary FET of an arbitrary digital circuit, the average probability of a signal being high can vary between 0% and 100%. The dependence of BTI on the duty cycle [called duty factor (DF) here], however, has been seldom studied experimentally. A NBTI $\Delta V_{th}$-DF dependence with an inflection point around DF $\sim$50%, first reported in Ref. 17, is shown in Fig. 3. Below we will show that this distinctive shape is a fundamental feature of NBTI relaxation.

V. STATES WITH WIDELY DISTRIBUTED TIME SCALES: SIMILARITY BETWEEN BTI RELAXATION AND LOW-FREQUENCY NOISE

Long, log($t$)-like behavior of $\Delta V_{th}$ without a characteristic time scale is typically observed in both the initial portion of NBTI degradation and the recovery phase. Figure 4(a) illustrates the rate of degradation $d\Delta V_{th}/dt_{relax}$ (Ref. 7) extracted from the log($t_{relax}$)-like $\Delta V_{th}$ NBTI relaxation transient after even a very short 0.1 s stress follows $1/t_{relax}$ for over 7 decades. Such behavior is a signature of states with discharging time constants covering as many decades.

Incidentally, superposition of states with widely distributed time scales is the standard explanation of the $1/f$ noise spectra, which are clearly observed in our pFETs [Fig. 4(b)]. This obvious similarity leads us to argue that the same states with widely distributed time scales, in fact, play a fundamental role in both NBTI and noise measurements. As will be shown below, this connection is crucial for understanding the properties of the defects contributing to BTI.
Fig. 4. (Color online) (a) Characteristic long, log-like $\Delta V_\text{th}$ relaxation trace is observed after even short (pulselike) NBTI stress. The rate of recovery $d\Delta V_\text{th}/dt_{\text{relax}}$, following $-1/t_{\text{relax}}$, for $\sim 7$ decades is a signature of states with discharging time constants covering as many decades. (b) Gate-referred noise spectra measured on the same (unstressed) devices show clear $1/f$ dependence, routinely explained by a superposition of states with widely distributed time scales.

VI. SEMIQUANTITATIVE MODEL FOR BTI RELAXATION

In order to visualize this common property it is beneficial to consider an equivalent circuit representing states with widely distributed time scales. We start by noting that in either NBTI relaxation or $1/f$ noise measurements, no maximum or minimum cut-off times are typically observed. For the sake of simplicity we therefore assume here that the time constants are log-uniformly distributed from times much shorter than the switching time of a pFET to very long, corresponding to the lifetime of a CMOS application. Such states with widely distributed time scales are then represented by “RC” elements in Fig. 5(a) with the total FET $\Delta V_\text{th}$ being proportional to the sum of voltages (“occupancies”) on all capacitors. For the sake of simplicity, we assume that all RC elements have the same weight and can be partially occupied, which emulates the behavior of a large-area device.

We find that most properties of the recoverable component can be reproduced when the Ohmic resistors in Fig. 5(a) are replaced with a nonlinear component [simulated by two diodes with different parameters, see Fig. 5(b)], which emulates different charging (i.e., capture) and discharge (i.e., emission) time constants of each defect. Such a circuit correctly reproduces DF (Fig. 6, cf. Fig. 3) and also the loglike relaxation and the loglike initial phase of stress (not shown).

VII. OBSERVING PROPERTIES OF INDIVIDUAL DEFECTS

Figure 7 shows two typical $\Delta V_\text{th}$ relaxation transients following positive $V_\text{G}$ stress on a single $70 \times 90$ nm$^2$ nFET (i.e., corresponding to PBTI). Conversely to the continuous relaxation curves obtained on large devices, a quantized $\Delta V_\text{th}$ transient is observed in the deeply scaled devices. In such devices, the relaxation is observed to proceed in discrete voltage steps, with each step corresponding to discharging of a single oxide defect. Upon repeated perturbation, each defect shows up in the relaxation trace with a characteristic

Fig. 5. (a) Equivalent circuit with exponentially increasing capacitances used to emulate defect states with widely distributed time scales such as those active in low-frequency noise. (b) The same circuit modified to account for charging (i.e., capture) and discharge (i.e., emission) time constants being voltage dependent, represented by asymmetric diodes. The sum of voltages on capacitors is assumed to be proportional to FET $\Delta V_\text{th}$.

Fig. 7. (Color online) Characteristic $\Delta V_\text{th}$ transients of a single $70 \times 90$ nm$^2$ 1 nm SiO$_2$/1.8 nm HfSiON FET device stressed at 25 °C and $V_\text{G}=2.8$ V for 184 ms. Four discrete drops are observed, indicating the existence of four active traps at the stress condition.
“fingerprint” consisting of its discharge, or *emission* time, and its voltage step.\(^\text{14}\)

Figure 8(a) shows the two-dimensional histogram of the heights and emission times of the steps when the experiment was repeated 70 times at the same stressing and relaxing condition as in Fig. 7.\(^\text{14}\) In Fig. 8(a), four clusters are clearly formed that correspond to four active defects in the time window of the experimental setup.

The emission times of each defect are stochastically distributed and follow an exponential distribution. This allows us to determine the average emission time \(\tau_e\). The capture time of each trap can be obtained by varying the stress (i.e., charging) time from 240 down to 2 ms. The intensity of the cluster decreases with reducing stress time when the characteristic capture time is in the range of the stress time. The fit of the intensity to \(I = I \exp(-t_{\text{stress}}/\tau_c)\) lets us calculate the average capture time \(\tau_c\). This technique is known as time dependent defect spectroscopy (TDDS).\(^\text{14}\)

In Fig. 8(b), an identical experiment but at 50 °C was repeated on the same device. Note the large horizontal shift of the clusters to shorter emission times with only a 25 °C temperature increase. The Arrhenius plots of the emission and capture times obtained at \(T\) from 10 to 50 °C (not shown) provide activation energies of 0.48 eV for emission and 0.25 eV for capture. Similarly thermally activated capture and emission times are also observed in both nFET and pFET (i.e., corresponding to NBTI) with conventional SiO\(_2\) gate oxide.\(^\text{14,23,24}\) We therefore conclude for all these cases that *both emission and capture in both electron and hole gate oxide traps are without a doubt thermally activated processes*. This experimental fact is incompatible with direct elastic tunneling theories widely used in different oxide trap characterization techniques and calculations. Consequently, a new model that takes into account this thermal dependence has to be considered.

**VIII. MODELING PROPERTIES OF INDIVIDUAL DEFECTS**

A model of the above-described properties of individual gate oxide defects can be constructed by drawing on the above similarities with low frequency and RTN.\(^\text{25}\) An example of the configuration coordinate diagram of the model is shown in the inset of Fig. 9. Four different configurations of the defect are considered.\(^\text{14}\) Two of the states are electrically neutral while two of them correspond to the singly positively charged state. In each charge state the defect is represented by a double well, with the first of the two states being the equilibrium state and the other a secondary (meta-stable) minimum. The time dynamics of the defect can be described by a simple stochastic Markov process. Broadly, transition rates between states involving charge transfer assume (1) tunneling between the substrate and the defect and (2) nonradiative multiphonon (NMP) theory, which has been often applied to explain RTN.\(^\text{26,27}\) Introduction of the NMP theory naturally explains the temperature dependence of both capture and emission time constants observed in the previous section. The wide distribution of time scales is then readily described by a distribution of the overlaps of the potential wells (i.e., a distribution of “potential barriers”).\(^\text{14}\)

The crucial extension of the NMP theory is the assumption of the relative position of the potential wells changing with gate bias,\(^\text{14}\) quite naturally introducing the required strong \(V_G\) dependence. As documented in Fig. 9, the model successfully describes the bias as well as the temperature dependences of the characteristic time constants. We also note that, contrary to techniques for the analysis of RTN,
which only allow monitoring the defect behavior in a rather narrow time window, TDDS can be used to study the defects’ capture and emission times over an extremely wide range.

We have previously argued that the phenomenon called NBTI relaxation in pFET devices is, in fact, just a different facet of the well-known low-frequency noise in these devices. While the low-frequency noise corresponds to the channel/gate dielectrics system being in the state of dynamic equilibrium, NBTI relaxation corresponds to the perturbed channel/gate dielectrics system returning to this equilibrium. Figure 10 then illustrates this concept on a simulated example of a deeply scaled pFET containing only five active defects. In particular, it shows that the same defects can be responsible both for RTN as well as the NBTI relaxation and the (initial phase of) NBTI stress.

IX. BTI DISTRIBUTION IN DEEPLY SCALED FETS

As CMOS devices scale toward atomic dimensions, device parameters become statistically distributed. Similarly, parameter shifts during device operation, once studied in terms of the average value only, will have to be described in terms of their distribution functions. The understanding of the properties of individual defects helps us to explain this distribution. Namely, much like in the case of RTN, we observe the distribution of downsteps $\Delta V_{th}$ due to individual discharging events to be exponentially distributed (Fig. 11). The exponential distribution of single-charge $\Delta V_{th}$ can be understood if nonuniformities in the pFET channel due to random dopant fluctuations are considered. A single discharging event in many devices routinely exceeds 15 mV, and in several devices exceeded 30 mV, the NBTI lifetime criterion presently used by some groups. For comparison, $\Delta V_{th}$ of less than 2 mV would be expected based on a simple charge sheet approximation. The large observed step height amplitude is due to the aggressively scaled dimensions of the pFETs used.

Since the charge lateral locations are uncorrelated, the overall $\Delta V_{th}$ distribution can be readily expressed as a convolution of individual exponential distributions [Eq. (1)], with the cumulative distribution function (CDF) given by

$$F_n(\Delta V_{th}, \eta) = 1 - \frac{\Gamma(n, \Delta V_{th}/\eta)}{(n-1)!},$$

where $n$ is the number of active defects in the device. An actual population of stressed devices will consist of devices with a different number $n$ of oxide defects in each device. That number will be Poisson distributed. The total $\Delta V_{th}$ distribution can be therefore obtained by summing distributions $F_n$ weighted by the Poisson probability

$$F_N(\Delta V_{th}, \eta) = \sum_{n=0}^{\infty} \frac{e^{-N}N^n}{n!} F_n(\Delta V_{th}, \eta),$$

where $N$ is the mean number of defects in the FET gate oxide and is related to the oxide trap (surface) density $N_{ot}$ as $N = WLN_{ot}$ (note that $N$ is not an integer). The CDF of Eq. (2) is plotted in Fig. 12 for several values of $N$. For comparison, measured total $\Delta V_{th}$ distributions for three different stress times from Ref. are excellently fitted by the derived analytical description.

![Fig. 10. (Color online) Simulated RTN, stress, and recovery behavior of a nanoscale device using a stochastic solution algorithm of the proposed model. (a) At the threshold voltage ($V_{th}$), the RTN is dominated by defect 5 with the occasional contribution from defect 3. Defects 1, 2, and 4 remain positively charged within the “simulation/experimental” window. (b) During stress ($V_{st}$), the capture times are dramatically reduced by the higher (more negative) gate voltage and defects 3 and 5 become predominantly positively charged ($\tau_c < \tau_e$). Defects 1, 2, and 4 start producing RTN. (c) During recovery (back at $V_{th}$), trapped charge is subsequently lost and the dynamic equilibrium behavior is gradually restored.

![Fig. 11. (Color online) Histogram of NBTI transient individual step heights measured on 72 devices shows a clear exponential distribution. The average $V_{th}$ shift $\eta$ corresponding to a single carrier discharge is 4.75 $\pm$ 0.30 mV in the pFETs with metallurgic length $L=35$ nm, width $W=90$ nm, and HfO$_2$ dielectrics with EOT=0.8 nm.](http://jvb.aip.org/jvb/copyright.jsp)
The advantage of describing the total $\Delta V_{th}$ distribution in terms of Eq. (2) is its relative simplicity and tangibility of the variables. The analytical description allows, among other things, to calculate NBTI threshold voltage shifts in an unlimited population of devices, a feat impossible through device simulations. This illustrates how a detailed understanding of the properties of individual defects can be beneficial to explaining real-world technological issues.

X. TECHNOLOGICAL SOLUTIONS

Once the underlying BTI mechanisms are understood, we can attempt to influence the defect properties to beneficial ends. Below we discuss two possible technological solutions for both PBTI and NBTI.

XI. IMPROVING PBTI WITH RARE-EARTH INCORPORATION

PBTI was considered a minor problem in technologies based on SiO$_2$. It arose as a reliability issue when high-$k$ materials were incorporated into the gate stack. However, when rare earths were introduced to adjust the nFET initial threshold voltage, this issue was mitigated, as can be seen in Fig. 13. A significant reduction of PBTI is observed in planar nFETs with lanthanum with respect to a lanthanum-free reference.

Positive BTI in nFETs with high-$k$ materials like HfO$_2$ has been linked to oxygen vacancies, which produce a defect level in the upper part of the oxide band gap. Group III elements compensate unpaired electrons around the oxygen vacancy in HfO$_2$ and the defects are “passivated” by being pushed up toward the conduction band minimum. Such states are not easily accessible to nFET channel electrons, resulting in the significant reduction of negative charge capture in the stack and hence the reduction of PBTI.

XII. IMPROVING NBTI IN HIGH-MOBILITY SIGE PFETS

Reduction of gate-stack equivalent oxide thickness (EOT), which is one of the most efficient ways to improve FET performance, enhances NBTI due to increased oxide electric field. As a consequence, 10 year lifetime can be guaranteed for sub-1 nm EOT Si pFETs only at gate overdrive voltages far below the expected operating voltages (Fig. 14).

Another way to improve FET performance is the use of high-mobility substrates such as buried-channel SiGe, which act as quantum well (QW) for holes. The Si cap lowers the inversion capacitance-equivalent thickness as compared to the accumulation capacitance. For these devices it is necessary to report the capacitance-equivalent thickness in inversion ($T_{inv}$) evaluated at $V_G=V_{th}+0.6$ V which will be affected by the thickness of the Si cap.

As can be seen from Fig. 14, SiGe-based device gate stacks significantly increase operating gate overdrive while still guaranteeing 10 year device lifetime and at the moment seem to be the only solution to the NBTI issue for sub-1 nm EOT devices. We have recently observed that both increasing the Ge content in the channel and increasing the SiGe QW...
thickness reduce NBTI. Most intriguingly, a reduction of Si cap thickness also diminishes NBTI. The most likely hypothesis explaining all three trends appears to be the energetic decoupling of the buried channel and the gate oxide defects.

XIII. CONCLUSIONS

In this article we have reviewed some of the shifts occurring in the past few years in our understanding of BTI. Among the most significant one we emphasize analyzing BTI relaxation with the tools originally developed for describing low-frequency noise. This includes the interpretation of the time, temperature, voltage, and duty cycle dependences of BTI. In step with the CMOS downsizing trend we discussed about the more promising technological solutions for addressing the BTI issues of future technologies. This theme was complemented by showing the most promising technological solutions for BTI.

ACKNOWLEDGMENTS

This work was performed under the IMEC core partner affiliation program. M. Toledano-Luque’s stay was supported in part by the Spanish Ministry of Education and Science under Contract No. TEC2007-63318/MIC and the grant program “José Castillejo” (Grant No. JC2009/00052).

36J. Franco et al., “6Å EOT Si0.45Ge0.55 pMOSFET with Optimized Reliability (Vth=1V): Meeting the NBTI Lifetime Target at Ultra-Thin EOT,” Tech. Dig. - Int. Electron Devices Meet. (to be published).