>100 Views
March 18, 15
スライド概要
Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Online divergence switching for superresolution-based nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.485-488, Hawaii, USA, March 2014 (Student Paper Award).
http://d-kitamura.net/links_en.html
2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing Speech Analysis(2),2PM2-2 Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan)
Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 2
Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 3
Research background • Music signal separation technologies have received much attention. Applications • Automatic music transcription • 3D audio system, etc. Separate! • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • The separation performance of supervised NMF (SNMF) markedly degrades for the case of many source mixtures. We have been proposed a new hybrid separation method for stereo music signals. 4
Research background • Our proposed hybrid method Input stereo signal L R Spatial separation method (Directional clustering) SNMF-based separation method (Superresolution-based SNMF) Separated signal 5
Research background • Optimal divergence criterion in superresolution-based SNMF depends on the spatial conditions of the input signal. • Our aim in this presentation We propose a new optimal separation scheme for this hybrid method to separate the target signal with high accuracy for any types of the spatial condition. 6
Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 7
NMF [Lee, et al., 2001] • NMF – is a sparse representation algorithm. – can extract significant features from the observed matrix. Frequency Amplitude Basis matrix Activation matrix (spectral patterns) (Time-varying gain) Frequency Observed matrix (spectrogram) Time Amplitude Time Basis Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases 8
Optimization in NMF • The variable matrices and are optimized by minimization of the divergence between and . Cost function: : Entries of variable matrices and , respectively. • Euclidian distance (EUC-distance) and KullbuckLeibler divergence (KL-divergence) are often used for the divergence in the cost function. • In NMF-based separation, KL-divergence based cost function achieves high separation performance. 9
SNMF [Smaragdis, et al., 2007] • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound – Decompose into the target signal and other signal Training process Ex. Musical scale Sample sounds of target signal Supervised basis matrix (spectral dictionary) Optimize Separation process Mixed signal Target signal Fixed Other signal 10
Problem of SNMF • The separation performance of SNMF markedly degrades when many interference sources exist. Two-source case Separate Five-source case Separate Residual components 11
Directional clustering [Araki, et al., 2007] • Directional clustering – utilizes differences between channels as a separation cue. – Is equal to binary masking in the spectrogram domain. Input signal (stereo) Right C C C C C C L C R C C L C R C R L C L C Time L R L R R L C R R R L C Binary mask Frequency Spectrogram Frequency Left Center Separated signal Entry-wise product 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 Center 0 0 0 0 1 Time Binary masking L R • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 12
Hybrid method [D. Kitamura, et al., 2013] • We have proposed a new SNMF called superresolution-based SNMF and its hybrid method. • Hybrid method consists of directional clustering and superresolution-based SNMF. Hybrid method L Spatial separation Spectral separation Directional clustering Superresolutionbased SNMF R 13
Superresolution-based SNMF Other direction Target direction Time Directional clustering Separated cluster Reconstructed spectrogram Frequency Frequency Input spectrogram Frequency • This SNMF reconstructs the spectrogram obtained from directional clustering using supervised basis extrapolation. : Chasms Time Time Superresolutionbased SNMF 14
Superresolution-based SNMF • Spectral chasms owing to directional clustering Frequency Separated cluster Chasms : Chasm Time Supervised basis Treat these chasms as an unseen observations Extrapolate the fittest bases … 15
Directional clustering Time Frequency Separated cluster Binary masking Time Frequency Reconstructed data Superresolutionbased SNMF Extrapolate Time Supervised spectral bases Target (a) Input signal Left Frequency of source component Target Interference Right Center Direction (b) After directional clustering z Left Frequency of source component Frequency Observed spectrogram Frequency of source component Superresolution-based SNMF Center Direction Right (c) After superresolutionbased SNMF Left Center Direction Extrapolated components Right 16
Decomposition model and cost function Decomposition model: Supervised bases (Fixed) Cost function: Penalty term Regularization term : Index matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and : Weighting parameters, , respectively : Frobenius norm • The divergence is defined at all grids except for the chasms by using the index matrix . 17
Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . Update rules: 18
Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 19
Consideration for optimal divergence • Separation performance of conventional SNMF KL-divergence EUC-distance However… • Superresolution-based SNMF KL-divergence ? EUC-distance – Optimal divergence depends on the amount of spectral chasms. 20
Consideration for optimal divergence • Superresolution-based SNMF has two tasks. Superresolutionbased SNMF Signal separation Basis extrapolation • Abilities of each divergence KL-divergence EUC-distance Signal separation (Very good) (Good) Basis extrapolation (Poor) (Good) 21
Consideration for optimal divergence • Spectrum decomposed by NMF with KL-divergence tends to become sparse compared with that decomposed by NMF with EUC-distance. 0 -2 -4 -6 -8 -10 0 EUC-distance Amplitude [dB] Amplitude [dB] 0 -2 -4 -6 -8 -10 0 KL-divergence 1 2 3 4 Frequency [kHz] 5 1 2 3 4 Frequency [kHz] 5 • Sparse basis is not suitable for extrapolating using observable data. 22
Consideration for optimal divergence • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms because of the trade-off between separation and extrapolation abilities. Performance Total performance Separation Extrapolation KL-divergence EUC-distance Sparse Sparseness: 0 -2 -4 -6 -8 -10 0 Amplitude [dB] Amplitude [dB] 0 -2 -4 -6 -8 -10 0 Anti-sparse 1 2 3 4 Frequency [kHz] Strong 5 1 2 3 4 Frequency [kHz] Weak 5 23
Consideration for optimal divergence • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms. : Chasms Time If the chasms are not exist Frequency Frequency If there are many chasms : Chasms Time The extrapolation ability is required. The separation ability is required. EUC-distance should be used. KL-divergence should be used. 24
Hybrid method for online input data • When we consider applying the hybrid method to online input data… Binary mask Frequency Directional clustering Observed spectrogram Time Online binary-masked spectrogram 25
Hybrid method for online input data Frequency • We divide the online spectrogram into some block parts. Time In parallel Superresolutionbased SNMF Superresolutionbased SNMF Superresolutionbased SNMF 26
Online divergence switching • We calculate the rate of chasms in each block part. Threshold value The chasms are not exist so much. Superresolutionbased SNMF with KL-divergence Threshold value There are many chasms. Superresolutionbased SNMF with EUC-distance 27
Procedure of proposed method 28
Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 29
Experimental conditions • We used stereo-panning signals. • Mixture of four instruments generated by MIDI synthesizer • We used the same type of MIDI sounds of the target instruments as supervision for training process. Left Center 2 4 1 Target source Right 3 Supervision sound Two octave notes that cover all the notes of the target signal 30
Experimental conditions • We compared three methods. – Hybrid method using only EUC-distance-based SNMF (Conventional method 1) – Hybrid method using only KL-divergence-based SNMF (Conventional method 2) – Proposed hybrid method that switches the divergence to the optimal one (Proposed method) • We used signal-to-distortion ratio (SDR) as an evaluation score. – SDR indicates the total separation accuracy, which includes both of quality of separated target signal and degree of separation. 31
Experimental result • Average SDR scores for each method, where the four instruments are shuffled with 12 combinations. Good Bad Conventional method 1 Conventional method 2 Proposed method 8.0 8.5 9.0 9.5 SDR [dB] 10.0 • Proposed method outperforms other methods. 32
Conclusions • We propose a new divergence switching scheme for superresolution-based SNMF. • This method is for the online input signal to separate using optimal divergence in NMF. • The proposed method can be used for any types of the spatial condition of sources, and separates the target signal with high accuracy. Thank you for your attention! 33