Although the time and frequency resolution problems are results of a physical phenomenon (the Heisenberg uncertainty principle) and exist regardless of the transform used, it is possible to analyze any signal by using an alternative approach called the

MRA is designed to give good time resolution and poor frequency resolution at high frequencies and good frequency resolution and poor time resolution at low frequencies. This approach makes sense especially when the signal at hand has high frequency components for short durations and low frequency components for long durations. Fortunately, the signals that are encountered in practical applications are often of this type. For example, the following shows a signal of this type. It has a relatively low frequency component throughout the entire signal and relatively high frequency components for a short duration somewhere around the middle.

Figure 3.1

The continuous wavelet transform was developed as an alternative approach to the short time Fourier transform to overcome the resolution problem. The wavelet analysis is done in a similar way to the STFT analysis, in the sense that the signal is multiplied with a function, {\it the wavelet}, similar to the window function in the STFT, and the transform is computed separately for different segments of the time-domain signal. However, there are two main differences between the STFT and the CWT:

- The Fourier transforms of the windowed signals are not taken, and therefore single peak will be seen corresponding to a sinusoid, i.e., negative frequencies are not computed.
- The width of the window is changed as the transform is computed for every single spectral component, which is probably the most significant characteristic of the wavelet transform.

The continuous wavelet transform is defined as follows

$CWT_x^\psi(\tau,s) = \Psi_x^\psi(\tau,s) = \frac{1}{\sqrt{|s|}} \int x(t) \psi^* \left( \frac{t - \tau}{s} \right) dt$

Equation 3.1

Equation 3.1

As seen in the above equation, the transformed signal is a function of two variables,

The term

The term

The parameter

Figure 3.2

Fortunately in practical applications, low scales (high frequencies) do not last for the entire duration of the signal, unlike those shown in the figure, but they usually appear from time to time as short bursts, or spikes. High scales (low frequencies) usually last for the entire duration of the signal.

Scaling, as a mathematical operation, either dilates or compresses a signal. Larger scales correspond to dilated (or stretched out) signals and small scales correspond to compressed signals. All of the signals given in the figure are derived from the same cosine signal, i.e., they are dilated or compressed versions of the same function. In the above figure,

In terms of mathematical functions, if

However, in the definition of the wavelet transform, the scaling term is used in the denominator, and therefore, the opposite of the above statements holds, i.e., scales

Interpretation of the above equation will be explained in this section. Let

Once the mother wavelet is chosen the computation starts with

For convenience, the procedure will be started from scale

The wavelet is placed at the beginning of the signal at the point which corresponds to time = 0. The wavelet function at scale ``1'' is multiplied by the signal and then integrated

The wavelet at scale

This procedure is repeated until the wavelet reaches the end of the signal.

Then,

The above procedure is repeated for every value of s. Every computation for a given value of s fills the corresponding single row of the time-scale plane. When the process is completed for all desired values of s, the CWT of the signal has been calculated.

The figures below illustrate the entire process step by step.

Figure 3.3

In Figure 3.3, the signal and the wavelet function are shown for four different values of $\boldsymbol \tau$. The signal is a truncated version of the signal shown in Figure 3.1. The scale value is

If the signal has a spectral component that corresponds to the current value of s (which is 1 in this case), the product of the wavelet with the signal

The continuous wavelet transform of the signal in Figure 3.3 will yield large values for low scales around time 100 ms, and small values elsewhere. For high scales, on the other hand, the continuous wavelet transform will give large values for almost the entire duration of the signal, since low frequencies exist at all times.

Figure 3.4

Figure 3.5

Figures 3.4 and 3.5 illustrate the same process for the scales s = 5 and s = 20, respectively. Note how the window width changes with increasing scale (decreasing frequency). As the window width increases, the transform starts picking up the lower frequency components.

As a result, for every scale and for every time (interval), one point of the time-scale plane is computed. The computations at one scale construct the rows of the time-scale plane, and the computations at different scales construct the columns of the time-scale plane.

Now, let's take a look at an example, and see how the wavelet transform really looks like. Consider the

Figure 3.6

Figure 3.7 is the continuous wavelet transform (CWT) of this signal. Note that the axes are translation and scale, not time and frequency. However, translation is strictly related to time, since it indicates where the mother wavelet is located. The translation of the mother wavelet can be thought of as the time elapsed since

Figure 3.7

Note that in Figure 3.7 that smaller scales correspond to higher frequencies, i.e., frequency decreases as scale increases, therefore, that portion of the graph with scales around zero, actually correspond to highest frequencies in the analysis, and that with high scales correspond to lowest frequencies. Remember that the signal had 30 Hz (highest frequency) components first, and this appears at the lowest scale at a translations of 0 to 30. Then comes the 20 Hz component, second highest frequency, and so on. The 5 Hz component appears at the end of the translation axis (as expected), and at higher scales (lower frequencies) again as expected.

Figure 3.8

Now, recall these resolution properties: Unlike the STFT which has a constant resolution at all times and frequencies, the WT has a good time and poor frequency resolution at high frequencies, and good frequency and poor time resolution at low frequencies. Figure 3.8 shows the same WT in Figure 3.7 from another angle to better illustrate the resolution properties: In Figure 3.8, lower scales (higher frequencies) have

The axes in Figure 3.7 and 3.8 are normalized and should be evaluated accordingly. Roughly speaking the 100 points in the translation axis correspond to 1000 ms, and the 150 points on the scale axis correspond to a frequency band of 40 Hz (the numbers on the translation and scale axis

In this section we will take a closer look at the resolution properties of the wavelet transform. Remember that the resolution problem was the main reason why we switched from STFT to WT.

The illustration in Figure 3.9 is commonly used to explain how time and frequency resolutions should be interpreted. Every box in Figure 3.9 corresponds to a value of the wavelet transform in the time-frequency plane. Note that boxes have a certain

Figure 3.9

Let's take a closer look at Figure 3.9: First thing to notice is that although the widths and heights of the boxes change, the area is constant. That is each box represents an equal portion of the time-frequency plane, but giving different proportions to time and frequency. Note that at low frequencies, the height of the boxes are shorter (which corresponds to better frequency resolutions, since there is less ambiguity regarding the value of the exact frequency), but their widths are longer (which correspond to poor time resolution, since there is more ambiguity regarding the value of the exact time). At higher frequencies the width of the boxes decreases, i.e., the time resolution gets better, and the heights of the boxes increase, i.e., the frequency resolution gets poorer.

Before concluding this section, it is worthwhile to mention how the partition looks like in the case of STFT. Recall that in STFT the time and frequency resolutions are determined by the width of the analysis window, which is selected once for the entire analysis, i.e., both time and frequency resolutions are constant. Therefore the time-frequency plane consists of

Regardless of the dimensions of the boxes, the areas of all boxes, both in STFT and WT, are the same and determined by

This section describes the main idea of wavelet analysis theory, which can also be considered to be the underlying concept of most of the signal analysis techniques. The FT defined by Fourier use

All the definitions and theorems related to this subject can be found in Keiser's book,

Note: Most of the equations include letters of the Greek alphabet. These letters are written out explicitly in the text with their names, such as

A

$v = \sum\limits_{k} \nu^k b_k$

Equation 3.2

Equation 3.2

Equation 3.2 shows how any vector

This concept, given in terms of vectors, can easily be generalized to functions, by replacing the basis vectors $\boldsymbol {b_k}$ with basis functions $\boldsymbol {\phi_k(t)}$, and the vector

$f(t) = \sum\limits_{k} \mu_k \phi_k (t)$

Equation $3.2_a$

Equation $3.2_a$

The complex exponential (sines and cosines) functions are the basis functions for the FT. Furthermore, they are orthogonal functions, which provide some desirable properties for reconstruction.

Let f(t) and g(t) be two functions in $L^2 [a,b]$. ($L^2 [a,b]$ denotes the set of square integrable functions in the interval $[a,b]$). The inner product of two functions is defined by Equation 3.3:

$< f(t), g(t) > = \int_a^b f(t) \cdot g^*(t) dt$

Equation 3.3

Equation 3.3

According to the above definition of the inner product, the CWT can be thought of as the inner product of the test signal with the basis functions $\psi_(\tau ,s)(t)$:

$CWT_x^\psi(\tau, s) = \Psi_x^\psi(\tau, s) = \int x(t) \cdot \psi^*_{\tau, s}(t) dt$

Equation 3.4

Equation 3.4

where,

$\psi_{\tau, s} = \frac{1}{\sqrt{s}} \psi \left( \frac{t - \tau}{s} \right)$

Equation 3.5

Equation 3.5

This definition of the CWT shows that the wavelet analysis is a measure of similarity between the basis functions (wavelets) and the signal itself. Here the similarity is in the sense of similar frequency content. The calculated CWT coefficients refer to the closeness of the signal to the wavelet

This further clarifies the previous discussion on the correlation of the signal with the wavelet at a certain scale. If the signal has a major component of the frequency corresponding to the current scale, then the wavelet (the basis function) at the current scale will be

Two vectors

$< v, w > = \sum\limits_{n} v_n w^*_n = 0$

Equation 3.6

Equation 3.6

Similarly, two functions $f$ and $g$ are said to be orthogonal to each other if their inner product is zero:

$< f(t), g(t) > = \int_a^b f(t) \cdot g^*(t) \cdot dt = 0$

Equation 3.7

Equation 3.7

A set of vectors {$\boldsymbol{v_1, v_2, ....,v_n}$} is said to be

$< v_m, v_n > = \delta_{mn}$

Equation 3.8

Equation 3.8

Similarly, a set of functions {$phi_k(t)$}, $k=1,2,3,...,$ is said to be orthonormal if

$\int_a^b \phi_k(t) \cdot \phi^*_l(t) \cdot dt = 0$ $k \neq l$ (orthogonality cond.)

Equation 3.9

Equation 3.9

and

$\int_a^b \{ | \phi_k(t) | \}^2 dx = 1$

Equation 3.10

Equation 3.10

or equivalently

$\int_a^b \phi_k(t) \cdot \phi_l^*(t) \cdot dt = \delta_{kl}$

Equation 3.11

Equation 3.11

where, $\delta_{kl}$ is the

$\delta_{kl} =
\left\{
\begin{array}{ll}
1, & k = l \\
0, & k \neq l\\
\end{array}
\right.$

Equation 3.12

Equation 3.12

As stated above, there may be more than one set of basis functions (or vectors). Among them, the orthonormal basis functions (or vectors) are of particular importance because of the nice properties they provide in finding these analysis coefficients. The orthonormal bases allow computation of these coefficients in a very simple and straightforward way using the orthonormality property.

For orthonormal bases, the coefficients, $\mu_k$, can be calculated as

$\mu_k = < f, \phi_k > = \int f(t) \cdot \phi_k^*(t) \cdot dt$

Equation 3.13

Equation 3.13

and the function f(t) can then be reconstructed by Equation $3.2_a$ by substituting the $\mu_k$ coefficients. This yields

$f(t) = \sum\limits_{k} \mu_k \phi_k(t) = \sum\nolimits_{k} < f, \phi_k > \phi_k(t)$

Equation 3.14

Equation 3.14

Orthonormal bases may not be available for every type of application where a generalized version,

In some applications, however, biorthogonal bases also may not be available in which case frames can be used. Frames constitute an important part of wavelet theory, and interested readers are referred to Kaiser's book mentioned earlier.

Following the same order as in chapter 2 for the STFT, some examples of continuous wavelet transform are presented next. The figures given in the examples were generated by a program written to compute the CWT.

Before we close this section, I would like to include two mother wavelets commonly used in wavelet analysis. The Mexican Hat wavelet is defined as the second derivative of the Gaussian function:

$w(t) = \frac{1}{\sqrt{2\pi} \cdot \sigma} e^{\frac{-t^2}{2 \sigma^2}}$

Equation 3.15

Equation 3.15

which is

$\psi(t) = \frac{1}{\sqrt{2 \pi} \cdot \sigma^3} \left( e^{\frac{-t^2}{2 \sigma^2}} \cdot \left( \frac{t^2}{\sigma^2} - 1 \right) \right)$

Equation 3.16

Equation 3.16

The Morlet wavelet is defined as

$w(t) = e^{i a t} \cdot e^{-\frac{t^2}{2\sigma}}$

Equation $3.16_a$

Equation $3.16_a$

where

All of the examples that are given below correspond to real-life non-stationary signals. These signals are drawn from a database signals that includes

The following signal shown in Figure 3.11 belongs to a normal person.

Figure 3.11

and the following is its CWT. The numbers on the axes are of no importance to us. those numbers simply show that the CWT was computed at 350 translation and 60 scale locations on the translation-scale plane. The important point to note here is the fact that the computation is not a true

Figure 3.12

and the Figure 3.13 plots the same transform from a different angle for better visualization.

Figure 3.13

Figure 3.14 plots an event related potential of a patient diagnosed with Alzheimer's disease

Figure 3.14

and Figure 3.15 illustrates its CWT:

Figure 3.15

and here is another view from a different angle

Figure 3.16

The continuous wavelet transform is a reversible transform, provided that Equation 3.18 is satisfied. Fortunately, this is a very non-restrictive requirement. The continuous wavelet transform is reversible if Equation 3.18 is satisfied, even though the basis functions are in general may not be orthonormal. The reconstruction is possible by using the following reconstruction formula:

$x(t) = \frac{1}{C_\psi^2} \int_s \int_\tau \left[ \Psi^\psi_x(\tau, s) \frac{1}{s^2} \psi \left( \frac{t - \tau}{s} \right) \right] d\tau \cdot ds$

Equation 3.17

Equation 3.17

where $C_\psi$ is a constant that depends on the wavelet used. The success of the reconstruction depends on this constant called,

$C_\psi = \left\{ 2 \pi \int_{-\infty}^{\infty} \frac{|\hat{\psi}(\xi)|^2}{|\zeta|} d\xi \right\} ^{\frac{1}{2}} < \infty$

Equation 3.18

Equation 3.18

where $\hat{\psi}(\xi)$ is the FT of $\psi(t)$. Equation 3.18 implies that $\hat{\psi}(0) = 0$, which is

$\int \psi(t) \cdot dt = 0$

Equation 3.19

Equation 3.19

As stated above, Equation 3.19 is not a very restrictive requirement since many wavelet functions can be found whose integral is zero. For Equation 3.19 to be satisfied, the wavelet must be oscillatory.

In today's world, computers are used to do most computations (well,...ok... almost all computations). It is apparent that neither the FT, nor the STFT, nor the CWT can be practically computed by using analytical equations, integrals, etc. It is therefore necessary to discretize the transforms. As in the FT and STFT, the most intuitive way of doing this is simply sampling the time-frequency (scale) plane. Again intuitively, sampling the plane with a uniform sampling rate sounds like the most natural choice. However, in the case of WT, the scale change can be used to reduce the sampling rate.

At higher scales (lower frequencies), the sampling rate can be decreased, according to Nyquist's rule. In other words, if the time-scale plane needs to be sampled with a sampling rate of $\boldsymbol{N_1}$ at scale $\boldsymbol{s_1}$, the same plane can be sampled with a sampling rate of $\boldsymbol{N_2}$, at scale $\boldsymbol{s_2}$, where, $\boldsymbol{s_1 < s_2}$ (corresponding to frequencies $\boldsymbol{f_1 > f_2}$ ) and $\boldsymbol{N_2 < N_1}$. The actual relationship between $\boldsymbol{N_1}$ and $\boldsymbol{N_2}$ is

$N_2 = \frac{s_1}{s_2} N_1$

Equation 3.20

Equation 3.20

or

$N_2 = \frac{f_2}{f_1} N_1$

Equation 3.21

Equation 3.21

In other words, at lower frequencies the sampling rate can be decreased which will save a considerable amount of computation time.

It should be noted at this time, however, that the discretization can be done in any way without any restriction as far as the analysis of the signal is concerned. If synthesis is not required, even the Nyquist criteria does not need to be satisfied. The restrictions on the discretization and the sampling rate become important if, and only if, the signal reconstruction is desired. Nyquist's sampling rate is the minimum sampling rate that allows the original

As mentioned earlier, the wavelet $\boldsymbol{\psi(\tau,s)}$ satisfying Equation 3.18, allows reconstruction of the signal by Equation 3.17. However, this is true for the continuous transform. The question is: can we still reconstruct the signal if we discretize the time and scale parameters? The answer is "yes", under certain conditions (as they always say in commercials: certain restrictions apply !!!).

The scale parameter s is discretized first on a logarithmic grid. The time parameter is then discretized

Figure 3.17

Think of the area covered by the axes as the entire time-scale plane. The CWT assigns a value to the continuum of points on this plane. Therefore, there are an infinite number of CWT coefficients. First consider the discretization of the scale axis. Among that infinite number of points, only a finite number are taken, using a logarithmic rule. The base of the logarithm depends on the user. The most common value is 2 because of its convenience. If 2 is chosen, only the scales 2, 4, 8, 16, 32, 64,...etc. are computed. If the value was 3, the scales 3, 9, 27, 81, 243,...etc. would have been computed. The time axis is then discretized according to the discretization of the scale axis. Since the discrete scale changes by factors of 2 , the sampling rate is reduced for the time axis by a factor of

Note that at the lowest scale (s = 2), only 32 points of the time axis are sampled (for the particular case given in Figure 3.17). At the next scale value, s = 4, the sampling rate of time axis is reduced by a factor of 2 since the scale is increased by a factor of 2, and therefore, only 16 samples are taken. At the next step, s = 8 and 8 samples are taken in time, and so on.

Although it is called the time-scale plane, it is more accurate to call it the

Similar to the relationship between continuous Fourier transform, Fourier series and the discrete Fourier transform, there is a continuous wavelet transform, a semi-discrete wavelet transform (also known as wavelet series) and a discrete wavelet transform.

Expressing the above discretization procedure in mathematical terms, the scale discretization is $\boldsymbol{s = s_0^j}$, and translation discretization is $\boldsymbol{\tau = k \cdot s_0^j \cdot \tau_0}$ where $\boldsymbol{s_0>1}$ and $\boldsymbol{tau_0>0}$. Note, how the translation discretization is dependent on scale discretization with $\boldsymbol{s_0}$.

The continuous wavelet function

$\psi_{\tau, s} = \frac{1}{\sqrt{s}} \psi \left( \frac{t - \tau}{s} \right)$

Equation 3.22

Equation 3.22

$\psi_{j, k}(t) = s_0^{\frac{-j}{2}} \psi \left( s_0^{-j} - k \tau_0 \right)$

Equation 3.23

Equation 3.23

by inserting $\boldsymbol{s = s_0^{\, j}}$, and $\boldsymbol{\tau = k \cdot s_0^{\, j} \cdot \tau_0}$.

If $\boldsymbol{ \left\{ \psi_{(j, \, k)} \right\} }$ constitutes an orthonormal basis, the wavelet series transform becomes

$\Psi^{\psi_{\, j,k}}_x = \int x(t) \, \psi^*_{j, \, k}(t) \, dt$

Equation 3.24

Equation 3.24

or

$x(t) = c_\psi \sum\limits_{j} \sum\limits_{k} \Psi^{\psi_{\, j,k}}_x \, \psi_{\, j,k} (t)$

Equation 3.25

Equation 3.25

A wavelet series requires that $\boldsymbol{ {\psi_{(j, \, k)}} }$ are either orthonormal, biorthogonal, or frame. If $\boldsymbol{ {\psi_{(j,k)}} }$ are not orthonormal, Equation 3.24 becomes

$\Psi^{\psi_{\, j,k}}_x = \int x(t) \, \hat{\psi^*_{(j, \, k)}(t)} \, dt$

Equation 3.26

Equation 3.26

where $\boldsymbol{\hat{\psi_{j, \, k}^*(t)}}$, is either the

If $\boldsymbol{ \{ \psi_{(j, \, k)} \} }$ are orthonormal or biorthogonal, the transform will be non-redundant, where as if they form a frame, the transform will be redundant. On the other hand, it is much easier to find frames than it is to find orthonormal or biorthogonal bases.

The following analogy may clear this concept. Consider the whole process as looking at a particular object. The human eyes first determine the coarse view which depends on the distance of the eyes to the object. This corresponds to adjusting the scale parameter $\boldsymbol{s_0^{-j}}$. When looking at a very close object, with great detail,

How low can the sampling rate be and still allow reconstruction of the signal? This is the main question to be answered to optimize the procedure. The most convenient value (in terms of programming) is found to be "2" for $s_0$ and "1" for $\tau$. Obviously, when the sampling rate is forced to be as low as possible, the number of available orthonormal wavelets is also reduced.

The continuous wavelet transform examples that were given in this chapter were actually the wavelet series of the given signals. The parameters were chosen depending on the signal. Since the reconstruction was not needed, the sampling rates were sometimes far below the critical value where $s_0$ varied from 2 to 10, and $\tau_0$ varied from 2 to 8, for different examples.

This concludes Part III of this tutorial. I hope you now have a basic understanding of what the wavelet transform is all about. There is one thing left to be discussed however. Even though the discretized wavelet transform can be computed on a computer, this computation may take anywhere from a couple seconds to couple hours depending on your signal size and the resolution you want. An amazingly fast algorithm is actually available to compute the wavelet transform of a signal. The discrete wavelet transform (DWT) is introduced in the final chapter of this tutorial, in Part IV.

Let's meet at the grand finale, shall we?

All Rights Reserved. This tutorial is intended for educational purposes only. Unauthorized copying, duplicating and publishing is strictly prohibited.

Robi Polikar

Rowan University

Phone: (856) 256 5372

polikar@rowan.edu