next up previous contents
Next: A.2 PCA of a Up: A. Principal Component Analysis Previous: A. Principal Component Analysis

Subsections


A.1 Statistical Background

In this section some well established definitions of statistical terms are given. The following explanations are adapted from [47], but can be found in any basic book which deals with statistical mathematics.

A.1.1 Mean

Suppose there is a random population (or data set) $ \mathbf{x}$ , where

$\displaystyle \mathbf{x}=\{x_1,\ldots,x_N\}$ (A.1)

and the mean of that population is given by

$\displaystyle \mu_{\mathbf{x}} = E \{\mathbf{x}\} = \frac{\sum_{i=1}^{N} x_i}{N}.$ (A.2)

A.1.2 Standard Deviation

For the standard deviation of a random variable $ \mathbf{x}$ we define:

$\displaystyle \sigma = \sqrt{\frac{1}{N} \overset{N}{\underset{i=1}{\sum}} (x_{i} -\mu_{\mathbf{x}} )^2}.$ (A.3)

Finding the standard deviation of an entire population is in general unrealistic. In most cases a sample standard deviation $ (s)$ is used to estimate a population standard deviation $ (\sigma{})$ . Given only a sample of values $ x_1,\ldots,x_n$ from some larger population $ x_1,\ldots,x_N$ with $ n
< N$ , the sample (or estimated) standard deviation is given by:

$\displaystyle s = \sqrt{\frac{1}{n-1} \overset{n}{\underset{i=1}{\sum}} (x_i - \mu_\mathbf{x})^2}$ (A.4)

and the variance $ \operatorname{var}(\mathbf{x})$ is simply the standard deviation squared, which can be written as

$\displaystyle s^2 = \operatorname{var}(\mathbf{x}) = \frac{\sum_{i=1}^n (x_i - \mu_\mathbf{x}) (x_i - \mu_\mathbf{x}) }{(n-1)}.$ (A.5)

The reason for this definition is that $ \operatorname{var}(\mathbf{x})$ is an unbiased estimator for the variance $ \sigma{}^2$ of the underlying population (see [125] for a detailed description). However, in the following Equation (A.4) is used for calculating the standard deviation.

Roughly speaking the variance gives a measure of the spread of data in a population. Standard deviation and variance operate only on one-dimensional data, which is very limiting, since many populations have more than one dimension.

A.1.3 Covariance and Covariance Matrix

For multi-dimensional data sets it is useful to define a measure in order to find out, how much the dimension vary from the mean with respect to each other. Covariance is such a measure and is defined for two data sets $ \mathbf{x}$ and $ \mathbf{y}$ via:

$\displaystyle \operatorname{cov}(\mathbf{x},\mathbf{y}) = \frac{\sum_{i=1}^n (x_i - \mu_\mathbf{x}) (y_i - \mu_\mathbf{y}) }{(n-1)}.$ (A.6)

It is straightforward to show that the covariance of two equal data sets form the variance of the data set. Additionally the following characteristic holds $ \operatorname{cov}(\mathbf{x},\mathbf{y})=
\operatorname{cov}(\mathbf{y},\mathbf{x})$ since multiplication is commutative.

For $ n$ -dimensional data sets it is possible to calculate

$\displaystyle \frac{n!}{(n-2)!* 2}$ (A.7)

different covariance values (without variance values, where $ \operatorname{cov}(\mathbf{x},\mathbf{x}) =
\operatorname{var}(\mathbf{x})$ ), which can be written as matrix, the so-called covariance matrix $ C^{n \times{} n}$ defined by:

$\displaystyle C^{n \times{} n} = (c_{i,j}, c_{i,j} = \operatorname{cov} (Dim_i,Dim_j)),$ (A.8)

where $ Dim_k$ is the $ k^{th}$ dimension. For the following the dimension is set to three, using the usual dimensions $ \mathbf{x}$ , $ \mathbf{y}$ , and $ \mathbf{z}$ , the covariance matrix reads

$\displaystyle C^{3 \times 3} := \begin{pmatrix}\operatorname{cov}(\mathbf{x},\m...
...athbf{z},\mathbf{y}) & \operatorname{cov}(\mathbf{z},\mathbf{z}) \end{pmatrix}.$ (A.9)

A.1.4 Eigendecomposition

The decomposition of a square matrix $ \mathbf{A}$ into eigenvalues and eigenvectors is known as eigendecomposition. Let $ \mathbf{A}$ be a linear transformation represented by a matrix. If there is a vector $ \vec{x} \in \Bbb{R}^n \neq 0$ such that

$\displaystyle \mathbf{A}\vec{x}= \lambda \vec{x} \quad \Leftrightarrow \quad (\mathbf{A} - \lambda \mathbf{I})\vec{x}= 0$ (A.10)

for some scalar $ \lambda $ , then $ \lambda $ is called the eigenvalue of $ \mathbf{A}$ with corresponding eigenvector $ \vec{x}$ . $ \mathbf{I}$ again, is the identity matrix. Equation (A.10) has nontrivial solutions, if and only if the determinant vanishes. In this case the solution is given by:

$\displaystyle \operatorname{det}(\mathbf{A} - \lambda \mathbf{I}) = 0.$ (A.11)

This equation is known as the characteristic equation of $ \mathbf{A}$ , and the left-hand side is known as the characteristic polynomial.

PCA uses the eigenvalues and eigenvectors of the covariance matrix to form a feature vector by ordering the eigenvalues from higher to lower. So the eigenvector with the highest eigenvalues gives the component with highest significance of the data set. If lower eigenvalues with their corresponding eigenvectors are ignored, a reduction of the dimensionality can be introduced. Based on these feature vectors a new data set is derived, where the original data are expressed through the feature vectors.

Basically this is a transformation of the initial data set to express the data in terms of the patterns between them. This transformation is also known as Karhunen and Leove (KL) transform which was originally introduced as a series expansion for continuous random processes or for discrete random processes as Hotelling transform [126]. The following references are highly recommended for readers who are interested in PCA more deeply. A good introduction can be found in [47], a more general advise is given in the book of Jolliffe [46], and for application of PCA to computer vision [127] can be proposed.


next up previous contents
Next: A.2 PCA of a Up: A. Principal Component Analysis Previous: A. Principal Component Analysis

Wilfried Wessner: Mesh Refinement Techniques for TCAD Tools