Matlab and Audio Processing(1)


1. Collection of voice signals


The voice signal is an analog signal, which must first be converted to a digital signal through sampling. The essence is to convert the continuous signal into a pulse or digital sequence. We can use recording software to record a piece of audio in wav format first. Then use Matlab's audioread function to collect, remember the sampling frequency and sampling point. Then use the sound function.


2. Spectral characteristics of voice signals


In MATLAB, you can use the function fft to perform a fast Fourier transform on the signal to obtain the spectral characteristics of the signal.

Fast Fourier transform (FFT) is a fast algorithm of discrete Fourier transform. It is obtained by improving the discrete Fourier transform algorithm according to the odd, even, virtual, real and other characteristics of discrete Fourier transform.



3. Time domain analysis of voice signals


The time-domain analysis of voice signals is to analyze and extract the time-domain parameters of voice signals. When conducting speech analysis, the time domain waveform is the first and most intuitive. The voice signal itself is a time domain signal, so time domain analysis is the earliest used and the most widely used analysis method. This method directly uses the time domain waveform of the voice signal. Time domain analysis is usually used for the most basic parameter analysis and applications, such as speech segmentation, preprocessing, and large classification. The characteristics of this analysis method are:


①Indicates that the voice signal is relatively intuitive and the physical meaning is clear.

②It is relatively simple to implement and has few operations.

③ Some important parameters of speech can be obtained. Zh

④ Only general-purpose equipment such as oscilloscopes are used, which is relatively simple to use. 


The time domain parameters of the speech signal include short-term energy, short-term zero-crossing rate, short-term white correlation function and short-term average amplitude difference function, etc. This is a basic set of short-term parameters of the speech signal. It must be applied in digital processing technology. When calculating these parameters, square windows or Hamming windows are generally used. Analysis of the speech signal reveals that when voiced sounds are produced, although the vocal tract has several formants, due to the high frequency drop of the spectrum caused by the glottal wave, the voice energy is concentrated below 3kHz. When unvoiced, most of the energy appears at higher frequencies. High frequency means high average zero-crossing rate, low frequency means low average zero-crossing rate, so it can be considered that voiced voice has a low zero-crossing rate, while unvoiced voice has a high zero-crossing rate. Of course, this height is only relatively speaking, and there is no precise numerical relationship.


4. Frequency domain analysis of voice signals


The frequency domain analysis of the speech signal is to analyze the frequency domain characteristics of the speech signal. Broadly speaking, the frequency domain analysis of speech signals includes speech signal spectrum, power spectrum, cepstrum, spectrum envelope analysis, etc., and commonly used frequency domain analysis methods include band pass filter bank method, Fourier transform method, Several methods such as wire prediction. Because the speech wave is a non-stationary process, the standard Fourier transform suitable for periodic, transient or stationary random signals cannot be used to directly represent the speech signal, but the short-time Fourier transform should be used to analyze the frequency spectrum of the speech signal , The corresponding spectrum is called "short time spectrum".



评论(1)

热度(11)