« PreviousUpNext »Contents
Previous: 10 Advanced Step Detection Algorithm    Top: 10 Advanced Step Detection Algorithm    Next: 10.3 Evaluation of the Algorithm

10.2 The Bootstrapping and Cumulative Sum Algorithm

In general, the step detection algorithm has to extract unknown steps of height \( d \) given an unknown mean value of the signal, \( \mu (\tr ) \). Based on the mean shift model given by

(10.2) \begin{equation} r = \mu + d\sigma (\tr -\tau _e) \end{equation}

with the step function \( \sigma   \) and the emission time \( \tau _e \), a statistical treatment of this problem is possible using the bootstrapping and cumulative sum (BCSUM) algorithm [170].

The BCSUM detection algorithm has the ability of detecting either positive or negative steps in a data set. From the BCSUM procedure the decision that either a discrete step in the underlying data samples exists or not is obtained by selection of a detection sensitivity parameter \( \epsilon     \). In the following the cumulative sum charts [167, 168] are discussed, the bootstrapping mechanism is introduced and finally the advanced detection method is presented.

10.2.1 Cumulative Sum Charts

First consider a series of independent measurement points \( \mathbf {r} = [r_0, r_1, \hdots ,r_{N-1}] \) with \( N \) the number of samples. The signal distributions before and after a change in the signal mean are given by \( p_0(r) \) and \( p_1(r) \), respectively, with their corresponding means \( \mu _0 \) and \( \mu _1 = \mu _0 + d \), where \( d \) is the step height. Introducing the log-likelihood ratio given by

(10.3) \begin{equation} s\sb {i} = \ln \frac {p_1(r\sb {i})}{p_0(r\sb {i})} \end{equation}

a change in the signal mean is reflected by a change of the sign


Figure 10.1:  The log-likelihood ratio (bottom) is calculated for normally distributed data for \( N_1(0, 1) \), \( N_2(15, 1.5) \) and \( N_2(7.5, 0.5) \) (top). When the test signal \( r \) changes its underlying distribution function from \( N_1 \) to \( N_2 \) the resulting log-likelihood ratio \( s_{2,1} = \ln p_2(r)/p_1(r) \), whereas \( p_1 \) is the Gaussian distribution function for \( N_1 \) and \( p_2 \) the Gaussian distribution function for \( N_2 \), shows a change in the sign at the transition time instance, called change point. When the underlying data distribution of \( r \) goes from \( N_2 \) to \( N_3 \) a sign change is observed in the log-likelihood ration \( s_{3,2} = \ln p_3(r)/p_2(r) \) [MWC26].

of the mean value of the log-likelihood ratio \( S_m^n \) [168] of the data set \( [r_m, \hdots ,r_n] \) given by

(10.4) \begin{equation} S\sb {m}^n = \sum \sb {i=m}^n s\sb {i} \end{equation}

and is illustrated in Figure 10.1.

(-tikz- diagram)

Figure 10.2:  Illustration of the bootstrap and cumulative sum algorithm to detect one step in a very noisy data set. Even though the data set contains two steps, the algorithm has to be applied recursively because at most one, that is, the biggest change point is detected. Using the initial data set \( \mathbf {r} \) a number of \( B \) bootstraps \( \mathbf {r}_b^\ast    \) are created. These bootstraps \( \mathbf {r}_b^\ast      \) are resampled data sets of the initial data set \( \mathbf {r} \) by applying resampling with replacement scheme. Afterwards, the cumulative sum charts of the bootstrapped and the original data are created. The spans of the single cumulative sum (CSUM) charts of the bootstrapped data are collected by a sorted list, from the minimum to the maximum, \( \mathbf {\Gamma } \). The decision threshold \( \gamma _\epsilon    \) equals the element of \( \mathbf {\Gamma } \) with index \( i=B\epsilon   \), with \( \epsilon \in [0,1] \) called the detection sensitivity. Finally, the decision threshold \( \gamma _\epsilon     \) is compared to the span of the CSUM chart obtained from the initial data set \( \gamma    \). For \( \gamma > \gamma _\epsilon      \) the initial data set contains a change point. In that case the initial data set has to be split at the change point and the algorithm is applied recursively to both sequences. This divide and conquer procedure is continued until no further change points larger than a threshold are detected in the subsequences [MWC26].

Together with the assumption of normally distributed samples with the probability density function

(10.5) \begin{equation} p\sb {\mu ,\sigma }(r\sb {i}) = \frac {1}{\sigma \sqrt {2\pi }}\exp \biggl (-\frac {(r\sb {i} - \mu )^2}{2\sigma ^2}\biggr ), \end{equation}

the log-likelihood ratio for \( d=\mu _1-\mu _0>0 \) and \( \sigma =\sigma _0=\sigma _1 \) is

(10.6) \begin{equation} s\sb {i}^\uparrow = A\biggl (r\sb {i} - \mu - \frac {d}{2}\biggr ) \quad \text {with} \quad A = \frac {\mu _1 - \mu _0}{\sigma ^2}, \end{equation}

and the change point position is at the sample with index

(10.7) \begin{equation} i_{\tau _e}^\uparrow = \{k\colon \min \sb {0<k<N-1} S_0^{k}\}.    \end{equation}

In the case of trapping events, a negative mean shift is obtained and the log-likelihood ratio for \( d<0 \) reads

(10.8) \begin{equation} s\sb {i}^\downarrow = A\biggl (r\sb {i} + \mu - \frac {d}{2}\biggr ) \quad \text {with} \quad A = \frac {\mu _1 - \mu _0}{\sigma ^2}.   \end{equation}

with the change point at the sample index position

(10.9) \begin{equation} i_{\tau _e}^\downarrow = \{k\colon \max \sb {0<k<N-1} S_0^{k}\} \end{equation}

Since the change of the mean is unknown the detection of either positive or negative changes has to be provided by the algorithm. Combining both cases, \( d>0 \) and \( d<0 \), the log-likelihood ratio for normally distributed samples can be formulated as

(10.10) \begin{equation} s\sb {i} = r\sb {i} - \nu   \end{equation}

where the threshold value \( \nu   \) already considers the prefactor \( A \). With the choice of \( A=1 \) the CSUM chart function is

(10.11) \begin{equation} \mathcal {C} = \sum \sb {N} r\sb {i} - \nu \quad \text {with} \quad \nu = \frac {1}{N} \sum \sb {N} r\sb {i}.   \end{equation}

The sample index of the change point is given by

(10.12) \begin{equation} i_{\tau _e} = \left \{k\colon \max \left (|i_{\tau _e}^\downarrow |, |i_{\tau _e}^\uparrow |\right )\right \} = \{k\colon \max \left (|\mathcal {C}|\right )\}.   \end{equation}

10.2.2 Bootstrapping

A frequently used method for parameter estimation of a set of data samples for unknown underlying distributions is bootstrapping. Its big advantage is that it is fully automatic and it does not matter how complicated the mathematical model for the probability distribution is.

For a given set of samples \( \mathbf {r} \) of length \( N \) a bootstrap sample \( \mathbf {r}^\ast    \) is obtained by randomly choosing \( N \) samples from \( \mathbf {r} \) with replacement. Since bootstrapping is a resample technique with replacement, the samples out of \( \mathbf {r} \) can occur never, once or more often in the bootstrap estimate \( \mathbf {r}^\ast   \) [171].

The statistical nature of bootstrapping includes the necessity of a huge number \( B \) of bootstrap samples \( \mathbf {r}\sb {b}^\ast      \). This circumstance inevitably leads to computationally expensive procedures.

10.2.3 The Algorithm

The BCSUM algorithm is a Monte Carlo based combination of bootstraps and cumulative sum charts to detect changes in the signal mean. The divide and conquer procedure to detect one change point in the measurement data set \( \mathbf {r} \) is:

  • 1. Calculate the \( B \) bootstrap samples \( \mathbf {r}_1^\ast , \mathbf {r}_2^\ast , \hdots \mathbf {r}\sb {B}^\ast   \) from \( \mathbf {r} \).

  • 2. Create the cumulative sum charts \( \mathcal {C}\sb {b}^\ast       \) for each bootstrap sample \( \mathbf {r}\sb {b}^\ast   \) with \( b \in [1, B] \).

  • 3. Estimate the spans of each chart \( \mathcal {C}\sb {b}^\ast    \) and collect them into a sorted list from the minimum to the maximum \( \mathbf {\Gamma } \).

  • 4. With the detection sensitivity \( \epsilon     \) the decision threshold is obtained as \( \gamma _\epsilon = \mathbf {\Gamma }[\operatorname {floor}(B\epsilon )] \), where \( [\cdot ] \) denotes the list index operator. For a data set \( \mathbf {r} = [r_0, r_1, \ldots , r\sb {N-1}] \) the list index operator is

    (10.13) \begin{equation} \mathbf {r}[i] = r_i.   \end{equation}

  • 5. Create the CSUM chart \( \mathcal {C} \) from the initial data set and evaluate \( \gamma =\operatorname {span}\mathcal {C} \)

  • 6. A change point in the underlying sequence occurs if \( \gamma _\epsilon < \gamma      \). Split the data set at the change point position and recursively continue with 1) for each subsequence until all change points are detected.

Critical parameters for the detection outcome of the BCSUM algorithm is the number of bootstraps \( B \) and the detection sensitivity \( \epsilon     \). For statistical relevance, the bootstrapping and the calculation of \( \mathcal {C}\sb {b}^\ast    \) has to be performed many times, \( B=10^5 \) is recommended.

A choice of \( \epsilon \approx 1 \) means that just the most dominant changes in mean are detected. For very fast events, which often just span two or three measurement data samples, the detection sensitivity is a crucial parameter. To extract small and very fast changes \( \epsilon \approx 0.6 \) is recommended. Moreover, the position of the change point in the measurement data is directly accessible via the CSUM chart obtained from the initial data series.

Last, but not least it has to be noted that the algorithm can be applied to uniformly and non-uniformly sampled data as well. This property is very important in the context of the TDDS. To cover several decades in time a compromise between the sampling rate and the accumulated recovery time is necessary in order to achieve a feasible amount of measurement data.

« PreviousUpNext »Contents
Previous: 10 Advanced Step Detection Algorithm    Top: 10 Advanced Step Detection Algorithm    Next: 10.3 Evaluation of the Algorithm