Sound Analysis
89
autocorrelation
formant filter
formants
inverse formant filter
whitening filter
vocal-fold excitation
prediction coefficients
reflection coefficients
or
P
k=1
a
k
R(i - k) = -R(i) , i = 1, . . . , P ,
(29)
where
R(i) =
n
x(n)x(n - i)
(30)
is the signal autocorrelation.
In the z domain, equation (27) reduces to
E(z) = A(z)X(z)
(31)
where A(z) is the polynomial with coefficients a
1
. . . a
P
. In the case of voice sig-
nal analysis, the filter 1/A(z) is called the allpole formant filter because, if the
proper order P is chosen, its magnitude frequency response follows the envelope
of the signal spectrum, with its broad resonances called formants. The filter
A(z) is called the inverse formant filter because it extracts from the voice signal
a residual resembling the vocal tract excitation. A(z) is also called a whitening
filter because it produces a residual having a flat spectrum. However, we dis-
tinguish between two kinds of residuals, both having a flat spectrum: the pulse
train and the white noise, the first being the idealized vocal-fold excitation for
voiced speech, the second being the idealized excitation for unvoiced speech. In
reality, the residual is neither one of the two idealized excitations. At the resyn-
thesis stage the choice is either to use an encoded residual, possibly choosing
from a code book of templates, or to choose one of the two idealized excitations
according to a voiced/unvoiced decision made by the analysis stage.
When the target signal is periodic (voiced speech), a pitch detector can be
added to the analysis stage, so that the resynthesis can be driven by periodic
replicas of a basic pulse, with the correct inter-pulse period. Several techniques
are available for pitch detection, either using the residual or the target signal [53].
Although not particularly efficient, one possibility is to do a Fourier analysis
of the residual and estimate the fundamental frequency by the techniques of
section 4.1.5.
Summarizing, the information extracted in a frame by the analysis stage are:
· the prediction coefficients a
1
, . . . , a
P
;
· the residual e;
· pitch of the excitation residual;
· voiced/unvoiced information;
· signal energy (RMS amplitude).
These parameters, possibly modified, are used in the resynthesis, as explained
in section 5.1.3.
The equations (29) are solved via the well-known Levinson-Durbin recur-
sion [53], which provides the reflection coefficients of the lattice realization of
the filter 1/A(z). As we mentioned in section 2.2.4, the reflection coefficients
are related to a piecewise cylindrical modelization of the vocal tract. The LPC
analysis proceeds by frames lasting a few milliseconds. In each frame the signal