美文网首页
Matlab中MFCC的几种实现方式

Matlab中MFCC的几种实现方式

作者: 黑白格_0ca6 | 来源:发表于2019-10-13 20:59 被阅读0次

    相关的函数

    melbankm、mfcc_m、melcepst、cepstralFeatureExtractor、mfcc、HelperComputePitchAndMFCC、 melSpectrogram

    几种函数对比及说明

    • melbankm
      由Voicebox提供,在Mel频率上设计平均分布的滤波器,此函数与音频信号没有关系,只是做MFCC前对滤波器的设计。
    function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
    %MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
    %
    % Inputs:
    %       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
    %       n   length of fft
    %       fs  sample rate in Hz
    %       fl  low end of the lowest filter as a fraction of fs [default = 0]
    %       fh  high end of highest filter as a fraction of fs [default = 0.5]
    %       w   any sensible combination of the following:
    %             可取代Mel频率的选项:
    %             'b' = bark scale instead of mel
    %             'e' = erb-rate scale
    %             'l' = log10 Hz frequency scale
    %             'f' = linear frequency scale
    %
    %             'c' = fl/fh specify centre of low and high filters
    %             'h' = fl/fh are in Hz instead of fractions of fs
    %             'H' = fl/fh are in mel/erb/bark/log10
    %
    %             滤波器形状:
    %             't' = triangular shaped filters in mel/erb/bark domain (default)
    %             'n' = hanning shaped filters in mel/erb/bark domain
    %             'm' = hamming shaped filters in mel/erb/bark domain
    %
    %             'z' = highest and lowest filters taper down to zero [default]
    %             'y' = lowest filter remains at 1 down to 0 frequency and
    %                   highest filter remains at 1 up to nyquist freqency
    %
    %             'u' = scale filters to sum to unity
    %
    %             's' = single-sided: do not double filters to account for negative frequencies
    %
    %             输出滤波器组的响应曲线:
    %             'g' = plot idealized filters [default if no output arguments present]
    %
    % Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
    % Some people instead define an asymmetric triangular filter in the frequency domain.
    %
    %              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
    %
    % Outputs:  x     a sparse matrix containing the filterbank amplitudes
    %                 If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
    %                 otherwise size(x)=[p,1+floor(n/2)]
    %                 Note that the peak filter values equal 2 to account for the power
    %                 in the negative FFT frequencies.
    %           mc    the filterbank centre frequencies in mel/erb/bark滤波器中心频率
    %           mn    the lowest fft bin with a non-zero coefficient
    %           mx    the highest fft bin with a non-zero coefficient
    %                 Note: you must specify both or neither of mn and mx.mn与mx必须同时指定或者不指定
    %
    % =============================!用法举例(MFCC流程)==============================
    %
    % (a) Calcuate the Mel-frequency Cepstral Coefficients
    %
    %       f=rfft(s);                  % rfft() returns only 1+floor(n/2) coefficients去除虚数部分
    %       x=melbankm(p,n,fs);         % n is the fft length, p is the number of filters wanted
    %       z=log(x*abs(f).^2);         % multiply x by the power spectrum
    %       c=dct(z);                   % take the DCT
    %
    % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
    %
    %       f=fft(s);                        % n is the fft length, p is the number of filters wanted
    %       [x,mc,na,nb]=melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
    %       z=log(x*(f(na:nb)).*conj(f(na:nb)));
    %
    % (c) Plot the calculated filterbanks
    %
    %      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
    %
    % (d) Plot the idealized filterbanks (without output sampling)
    %
    %      melbankm(p,n,fs);
    

    该函数只是设计滤波器组,属于MFCC处理的一部分

    • mfcc_m
      由宋知用老师书中提供,涉及到归一化Mel滤波器组系数、归一化倒谱提升窗口
    bank=melbankm(p,frameSize,fs,0,0.5,'m');
    % 归一化Mel滤波器组系数
    bank=full(bank);
    bank=bank/max(bank(:));
    
    % 归一化倒谱提升窗口:对MFCC系数中某些谱线进行增强
    w = 1 + 6 * sin(pi * [1:p2] ./ p2);
    w = w/max(w);
    

    需要修正的地方:
    只有一阶差分系数
    滤波器选择后并不能只截取想要的部分
    归一化Mel滤波器组系数、归一化倒谱提升窗口有待考证

    • melcepst
      属于voicebox工具箱,现在官方已经不提供了,程序中调用了melbankm函数。
    function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)
    %MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
    %
    %
    % Simple use: (1) c=melcepst(s,fs)          % calculate mel cepstrum with 12 coefs, 256 sample frames
    %             (2) c=melcepst(s,fs,'E0dD')   % include log energy, 0th cepstral coef, delta and delta-delta coefs
    %
    % Inputs:
    %     s   speech signal
    %     fs  sample rate in Hz (default 11025)
    %     w   mode string (see below)
    %     nc  number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC维数设定
    %     p   number of filters in filterbank [default: floor(3*log(fs)) =  approx 2.1 per ocatave] 滤波器数量
    %     n   length of frame in samples [default power of 2 < (0.03*fs)] 帧长
    %     inc frame increment [default n/2] 帧移
    %     fl  low end of the lowest filter as a fraction of fs [default = 0] 滤波器最低频率
    %     fh  high end of highest filter as a fraction of fs [default = 0.5] 滤波器最高频率,通过fs归一化
    %
    %     w   any sensible combination of the following:
    %               时域窗函数:
    %               'R'  rectangular window in time domain
    %               'N'  Hanning window in time domain
    %               'M'  Hamming window in time domain (default)
    %
    %               频域窗函数:
    %               't'  triangular shaped filters in mel domain (default)
    %               'n'  hanning shaped filters in mel domain
    %               'm'  hamming shaped filters in mel domain
    %
    %
    %               'p'  filters act in the power domain
    %               'a'  filters act in the absolute magnitude domain (default)
    %
    %               MFCC除12维基本参数之外的选择:
    %               '0'  include 0'th order cepstral coefficient
    %               'E'  include log energy
    %               'd'  include delta coefficients (dc/dt)
    %               'D'  include delta-delta coefficients (d^2c/dt^2)
    %
    %               滤波器频率设置:
    %               'z'  highest and lowest filters taper down to zero (default)
    %               'y'  lowest filter remains at 1 down to 0 frequency and
    %                    highest filter remains at 1 up to nyquist freqency
    %
    %              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
    %
    % Outputs:  c     mel cepstrum output: one frame per row. Log energy, if requested, is the
    %                 first element of each row followed by the delta and then the delta-delta
    %                 coefficients.
    %           tc    fractional time in samples at the centre of each frame
    %                 with the first sample being 1.
    
    % ==================================设置默认参数=================================
    if nargin<2 fs=11025; end% 滤波器的最高频率
    if nargin<3 w='M'; end% hamming窗
    if nargin<4 nc=12; end% MFCC维数
    if nargin<5 p=floor(3*log(fs)); end% p个滤波器
    if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一帧FFT后数据的长度
    if nargin<9
       fh=0.5;% 滤波器的最高频率,用fs归一化   
       if nargin<8
         fl=0;% 设计滤波器的最低频率
         if nargin<7
            inc=floor(n/2);
         end
      end
    end
    
    if isempty(w)
       w='M';
    end
    if any(w=='R')
       [z,tc]=enframe(s,n,inc);
    elseif any (w=='N')
       [z,tc]=enframe(s,hanning(n),inc);
    else
       [z,tc]=enframe(s,hamming(n),inc);
    end
    
    % =================================!理论核心部分=================================
    f=rfft(z.');
    [m,a,b]=melbankm(p,n,fs,fl,fh,w);% m为滤波器的频域响应
    pw=f(a:b,:).*conj(f(a:b,:));% 计算帧能量
    pth=max(pw(:))*1E-20;
    if any(w=='p')
       y=log(max(m*pw,pth));
    else
       ath=sqrt(pth);
       y=log(max(m*abs(f(a:b,:)),ath));
    end
    c=rdct(y).';% 得到13维系数
    
    nf=size(c,1);
    nc=nc+1;
    if p>nc
       c(:,nc+1:end)=[];% 当滤波器个数比所需维数多的时候,就将后面滤波器获得的参数删去
    elseif p<nc
       c=[c zeros(nf,nc-p)];% 滤波器个数少的时候,用0补齐
    end
    if ~any(w=='0')
       c(:,1)=[];
       nc=nc-1;
    end
    if any(w=='E')
       c=[log(max(sum(pw),pth)).' c];
       nc=nc+1;
    end
    
    % ===============================计算一阶和二阶差分==============================
    if any(w=='D')
      vf=(4:-1:-4)/60;
      af=(1:-1:-1)/2;
      ww=ones(5,1);
      cx=[c(ww,:); c; c(nf*ww,:)];
      vx=reshape(filter(vf,1,cx(:)),nf+10,nc);
      vx(1:8,:)=[];
      ax=reshape(filter(af,1,vx(:)),nf+2,nc);
      ax(1:2,:)=[];
      vx([1 nf+2],:)=[];
      if any(w=='d')
         c=[c vx ax];
      else
         c=[c ax];
      end
    elseif any(w=='d')
      vf=(4:-1:-4)/60;
      ww=ones(4,1);
      cx=[c(ww,:); c; c(nf*ww,:)];
      vx=reshape(filter(vf,1,cx(:)),nf+8,nc);
      vx(1:8,:)=[];
      c=[c vx];
    end
     
    % =======================如果不输出任何参数,就会输出语谱图==========================
    if nargout<1
       [nf,nc]=size(c);
    %    t=((0:nf-1)*inc+(n-1)/2)/fs;
       ci=(1:nc)-any(w=='0')-any(w=='E');
       imh = imagesc(tc/fs,ci,c.');
       axis('xy');
       xlabel('Time (s)');
       ylabel('Mel-cepstrum coefficient');
        map = (0:63)'/63;
        colormap([map map map]);
        colorbar;
    end
    
    1. melcepst默认得到12维MFCC参数,时域中用hamming窗,频域中用三角窗,最低频率为0,最高频率为采样频率的一半(采样定理),帧移为帧长的一半,帧长为2的次幂但是小于0.03*fs。
      E:包括对数能量
      0:包括0阶倒谱系数
      d:包括一阶差分
      D:包括二阶差分
    2. melcepst对参数'0'的处理
    if ~any(w=='0')
       c(:,1)=[];
       nc=nc-1;
    end
    

    如果不需要'0'阶系数,就将第一列删除,并得到13-1=12维数据,说明DCT后得到的是13维数据,默认将第一个元素,即0阶倒谱系数删去。第一维比后12维都大很多(直流项?)。


    默认12维参数
    DCT后13维参数('0')
    13维参数'E'
    • cepstralFeatureExtractor
      由Audio Toolbox提供,需要先将音频分帧,每一列作为一帧,再将每一帧依次输入至cepstralFeatureExtractor,所以输入的第一帧的delta与deltaDelta都是0。
    test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';
    [x, fs] = audioread(test);
    n=pow2(floor(log2(0.03*fs)));
    inc=floor(n/2);
    f = enframe(x,hamming(n),inc);
    cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');
    [coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');
    

    参数设置中有FilterBankNormalization,选项为:Area,Bandwidth(默认),None,用于滤波器组的权重分配


    滤波器归一化

    cepstralFeatureExtractor类的部分代码:

    classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine
     %cepstralFeatureExtractor Cepstral Feature Extractor
     %   cepFeatures = cepstralFeatureExtractor returns a System object,
     %   cepFeatures, that calculates cepstral features. Columns of the input
     %   are treated as individual channels.
     %
     %   cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a
     %   cepstralFeatureExtractor System object, cepFeatures, with each
     %   specified property name set to the specified value. You can specify
     %   additional name-value pair arguments in any order as
     %   (Name1,Value1,...NameN,ValueN).
     %
     %   step method syntax内置的step()函数:
     %
     %   [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral
     %   coefficients, the delta, and the delta-delta. The log energy is also
     %   returned in the COEFFS output based on the LogEnergy property. The
     %   DELTA and DELTADELTA are initialized as zero-vectors. X must be a
     %   real-valued, double-precision or single-precision matrix. Each column
     %   of X is treated as an independent channel.
     %
     %   System objects may be called directly like a function instead of using
     %   the step method. For example, y = step(obj,x) and y = obj(x) are
     %   equivalent.
     %   对象可以直接作为函数使用,所以step()与obj()功能一致
     %
     %   cepstralFeatureExtractor methods:
     %   step       - See above description for use of this method
     %   release    - Allow property values and input characteristics to change
     %   clone      - Create cepstralFeatureExtractor object with same property 
     %                values
     %   isLocked   - Locked status (logical)
     %   <a href="matlab:help matlab.System/reset   ">reset</a>      - Reset the internal states to initial conditions
     %   getFilters - Get filterbank used to calculate the cepstral 
     %                coefficients
     %
     %   cepstralFeatureExtractor properties:
     %   FilterBank  - Filter bank ('Mel'/'Gammatone')
     %   InputDomain - Domain of input signal
     %   NumCoeffs   - Number of coefficients to return
     %   FFTLength   - FFT length
     %   LogEnergy   - Log energy usage ('Append'/'Replace'/'Ignore')
     %   SampleRate  - Sample rate (Hz)
     %
     %   Advanced properties:
     %   BandEdges               - Band edges of mel filter bank (Hz)
     %   FilterBankNormalization - Normalize filter bank
     %   FilterBankDesignDomain  - Domain for mel filter bank design
     %   FrequencyRange          - Gammatone filter bank frequency range
        
        %#codegen
        properties
            %SampleRate Input sample rate (Hz)
            % Specify the sampling rate of the input in Hertz as a real, finite
            % numeric scalar. The default is 16000 Hz. This property is 
            % tunable.
            SampleRate = 16000;
        end
        
        properties (Constant, Hidden)
            % SampleRateSet is used to setup the choices for SampleRate
            SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...
                'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});
        end
        
        properties (Nontunable)
            %BandEdges Band edges of Mel filter bank (Hz)
            % Specify the band edges of the mel filter bank as a monotonically
            % increasing vector in the range [0,fs/2]. The number of band edges
            % must be in the range [4,160]. The default band edges are spaced
            % linearly for the first ten and then logarithmically thereafter.
            % This property applies when FilterBank is 'Mel'.
            % 只有是Mel的时候,BandEdges属性才有用
            BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();
            %FFTLength FFT length 默认FFT长度是输入的行数,所以做好分帧!
            FFTLength = [];
            %NumCoeffs Number of coefficients to return 默认MFCC维数13
            NumCoeffs = 13;
            %InputDomain Domain of the input signal 默认输入数据是时域的
            InputDomain = 'Time';
            %FilterBankNormalization Filter bank normalization 默认以带宽设置滤波器权重
            FilterBankNormalization = 'Bandwidth';
            %LogEnergy Log energy usage 默认log能量参数是有的
            LogEnergy = 'Append';
        end
    ---------------------------------------------------------略-----------------------------------------------------------
    end
    
    • mfcc
      由Audio Toolbox提供,最低频率不是0,它用的是cepstralFeatureExtractor函数
    [audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
    [coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
    
    function varargout = mfcc(x, fs, varargin)
    %MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal
    %   coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral
    %   coefficients over time for the audio input. Columns of the input are
    %   treated as individual channels. coeffs is returned as an L-by-M-by-N
    %   array.
    %       L - Number of frames the audio signal is partitioned into.
    %           This is determined by the WINDOWLENGTH and OVERLAPLENGTH 
    %           properties.
    %       M - Number coefficients returned per frame.
    %           This is determined by the NUMCOEFFS property.
    %       N - Number of channels.
    %       
    %   'WindowLength' defaults to round(0.030 * fs).
    %   'OverlapLength' defaults to round(fs*0.02).
    %   'NumCoeffs'  If not specified, the number of coefficients is 13.
    %   'FFTLength'  By default, the FFT length is set to the WINDOWLENGTH.
    %   'DeltaWindowLength' The default is 2.
    %   coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log
    %   energy is used. Specify log energy as a character vector:
    %       'Append'  - Adds the log-energy as the first element of the
    %                   returned coefficients vector. This is the default
    %                   setting.
    %       'Replace' - Replaces the zeroth coefficient (first element of
    %                   coeffs) with the log-energy.
    %       'Ignore'  - Ignores and does not return the log-energy.
    
    % =========================验证输入数据的格式=============================
    validateRequiredInputs(x, fs)
    
    params =  audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 输入默认的参数
    
    hopLength = params.WindowLength - params.OverlapLength;% 帧移
    
    % ==========================创建mfcc提取object============================
    mfccObject = cepstralFeatureExtractor( ...
        'SampleRate',              fs, ...
        'FFTLength',               params.FFTLength, ...
        'NumCoeffs',               params.NumCoeffs, ...
        'LogEnergy',               params.LogEnergy);
    
    % ====================验证所需要的mfcc维数比滤波器个数少===================
    numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;
    coder.internal.errorIf(numValidBands < params.NumCoeffs, ...
        'audio:mfcc:BadNumCoeffs', ...
        numValidBands,fs);
    
    % ==========================mfcc参数获取=================================
    [nRow,nChan] = size(x);% 一般都是单通道,audiorea读取到的是一列数据
    N            = params.WindowLength;
    numHops      = floor((nRow-N)/hopLength) + 1;
    
    y            = audio.internal.buffer(x,N,hopLength);
    c            = mfccObject(y);% mfccObject是cepstralFeatureExtractor类,所以,与cepstralFeatureExtractor求解方法一样
    c2           = reshape(c , size(c,1) , size(c,2)/nChan   , nChan );
    coeffs       = permute(c2 , [2 1 3]);% 将第1维与第2维转置,因为cepstralFeatureExtractor得到的特征是列排的
    
    varargout{1} = coeffs;
    
    %=========================一阶差分====================================
    if nargout > 1
        delta        = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);
        varargout{2} = delta;
    end
    
    % ============================二阶差分=================================
    if nargout > 2
        deltaDelta   = audio.internal.computeDelta(delta,params.DeltaWindowLength);
        varargout{3} = deltaDelta;
    end
    
    % -------------------------------------------------------------------------
    % Output sample stamp -----------------------------------------------------
    if nargout > 3
        varargout{4} = ...
            cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);
    end
    
    end
    
    % -------------------------------------------------------------------------
    % Validate required inputs
    % -------------------------------------------------------------------------
    function validateRequiredInputs(x,fs)
    validateattributes(x,{'single','double'},...
        {'nonempty','2d','real'}, ...
        'mfcc','audioIn')
    validateattributes(fs,{'single','double'}, ...
        {'nonempty','positive','real','scalar','nonnan','finite'}, ...
        'mfcc','fs');
    end
    

    默认有40个滤波器,得到14维参数,相当于melcepst中的'E0',只是melcepst的最低频率从0Hz开始;delta与deltaDelta的第一行都是0;loc是每一帧的开始位置。
    如何使delta与deltaDelta的首行不为0?设置DeltaWindowLength 参数即可

    差分的计算 40组滤波器的频带范围

    从滤波器组设置可以看出,每个滤波器的起点是上个滤波器带宽的中点。

    • HelperComputePitchAndMFCC
      查看源码后,发现使用的是mfcc函数


      HelperComputePitchAndMFCC
    • melSpectrogram
      output的第一维是Number of bandpass filters in filterbank,默认为32个滤波器;第二维是Number of frames in spectrogram,即帧数。
      它不可以计算差分,只是spectrogram的一个小分支,若取40个滤波器,得到的结果与mfcc相近,只是需要转置一下

    几种实现方式的对比

    实现方式 MFCC 频谱图
    mfcc
    cepstralFeatureExtractor
    melcepst
    melSpectrogram

    结论

    可见,cepstralFeatureExtractor与mfcc所用算法基本一致,只是cepstralFeatureExtractor分帧求取,melcepst与它们的第2维数据有数量级的差异,暂时认为是滤波器归一化的原因。在mfcc中,log能量是作为额外系数默认附加的,通常Matlab会提供最好的性能,所以暂时按默认选项进行。melSpectrogram默认32个滤波器,mfcc默认40个滤波器,且melSpectrogram不能计算差分,所以mfcc总的来说,更合适作为以后的计算使用。

    相关文章

      网友评论

          本文标题:Matlab中MFCC的几种实现方式

          本文链接:https://www.haomeiwen.com/subject/dcjcmctx.html