>100 Views

March 18, 15

スライド概要

Presented at 5th International Conference on 3D Systems and Applications (3DSA 2013) (international conference)

Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Regularized superresolution-based binaural signal separation with nonnegative matrix factorization," Proceedings of 5th International Conference on 3D Systems and Applications (3DSA 2013), S10-4, Osaka, Japan, June 2013.

http://d-kitamura.net/links_en.html

1.

Regularized Superresolution-Based Binaural Signal Separation with Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano (Nara Institute of Science and Technology, Nara, Japan) Kazunobu Kondo, Yu Takahashi (Yamaha Corporation Research & Development Center, Shizuoka, Japan)

2.

Outline • 1. Research background • 2. Conventional method – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 2

3.

Outline • 1. Research background • 2. Conventional method – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 3

4.

Background • Music signal separation technologies have received much attention. Applications • Automatic music transcription • 3D audio system, etc. • Music signal separation based on nonnegative matrix factorization (NMF) has been a very active area of the research. • The extraction performance of NMF markedly degrades for the case of many source mixtures. We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments. 4

5.

Outline • 1. Research background • 2. Conventional method – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 5

6.

NMF Frequency Frequency Amplitude • NMF is a type of sparse representation algorithm that decomposes a nonnegative matrix into two nonnegative matrices. [D. D. Lee, et al., 2001] Time Observed matrix (Spectrogram) 𝒀: Observed matrix 𝑭: Basis matrix 𝑮: Activation matrix Time Amplitude Activation matrix (Time-varying gain) Basis matrix (Spectral bases) Ω: Number of frequency bins 𝑇: Number of frames 𝐾: Number of bases 6

7.

Penalized Supervised NMF (PSNMF) • In PSNMF, the following decomposition is addressed under the condition that is known in advance. [Yagi, et al., 2012] Training process Supervised bases of the target sound Supervision sound Separation process Fix trained bases and update . Update is forced to become uncorrelated with 7

8.

Penalized Supervised NMF (PSNMF) • In PSNMF, the following decomposition is addressed under the condition that is known in advance. [Yagi, et al., 2012] Training process Supervised bases of the target sound Supervision sound Separation process Fix trained bases and update Update Problem of PSNMF: When the signal includes many sources, is forced to become uncorrelated with the extraction performance markedly degrades. . 8

9.

Directional Clustering • Directional clustering can estimate sources and their direction in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009] • This method can separate sources with spatial information in an observed signal. L R L-ch input signal ：Source component ：Centroid vector R-ch input signal 9

10.

Directional Clustering • Directional clustering can estimate sources and their direction in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009] • This method can separate sources with spatial information in an observed signal. L R L-ch input signal ：Source component ：Centroid vector R-ch input signal Problem of directional clustering: This method cannot separate sources in the same direction. 10

11.

Hybrid method • Conventional hybrid method utilizes PSNMF after the directional clustering. [Iwao, et al., 2012] • This method consists of two techniques. – Directional clustering – PSNMF L R Spatial separation Source separation Directional clustering PSNMF Conventional Hybrid method 11

12.

Problem of hybrid method 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 Frequency Frequency Frequency • The signal extracted by the hybrid method suffers from the generation of considerable distortion due to the binary masking in directional clustering. • The signal in the target direction, which is obtained by directional clustering, has many spectral chasms. • The resolution of the spectrogram is degraded. Directional Clustering Input spectrogram Binary mask Separated cluster 1 0 0 0 0 0 0 1 1 1 0 1 1 0 Time Time : Target direction Time : Other direction ：Hadamard product (product of each element) 12

13.

Outline • 1. Research background • 2. Conventional method – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 13

14.

Proposed hybrid method Conventional hybrid method Proposed hybrid method Input stereo signal L-ch Input stereo signal R-ch L-ch R-ch STFT STFT Directional clustering Directional clustering Center component L-ch R-ch PSNMF PSNMF ISTFT ISTFT Mixing Extracted signal Index of center cluster Center component L-ch R-ch Superresolutionbased SNMF Superresolutionbased SNMF ISTFT ISTFT Mixing Extracted signal Employ a new supervised NMF algorithm as an alternative to the conventional PSNMF in the hybrid method. 14

15.

Regularized superresolution-based NMF • In proposed supervised NMF, the spectral chasms are treated as unseen observations using index matrix. Chasms Frequency Separated cluster : Chasms Time Frequency Index matrix Treat chasms as unseen observations. 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 Time 15

16.

Regularized superresolution-based NMF : Chasms Time Superresolution using supervised bases Supervised bases Frequency Frequency • The spectrogram of the target sound is reconstructed using more matched bases because chasms are treated as unseen. • The components of the target sound lost after directional clustering can be extrapolated using supervised bases. Separated cluster Reconstructed spectrogram Time 16

17.

Regularized superresolution-based NMF • Signal flow of the proposed hybrid method Frequency of source component Target source (a) Observed spectra Left Center Direction Right 17

18.

Regularized superresolution-based NMF • Signal flow of the proposed hybrid method Frequency of source component Target source (a) Observed spectra Left Frequency of source component Target direction Center Direction (b) After directional clustering Left Right Directional clustering Center sources lose some of their z components Center Direction Right 18

19.

Regularized superresolution-based NMF Frequency of source component • Signal flow of the proposed hybrid method (b) After directional clustering Left Center sources lose some of their z components Center Direction Right 19

20.

Regularized superresolution-based NMF Frequency of source component • Signal flow of the proposed hybrid method (b) After directional clustering Frequency of source component Left Center sources lose some of their z components Center Direction (c) After superresolutionbased SNMF Left Center Direction Right Superresolutionbased NMF Extrapolated target source Right 20

21.

Regularized superresolution-based NMF • The basis extrapolation includes an underlying problem. • If the time-frequency spectra are almost unseen in the spectrogram, which means that the indexes are almost zero, a large extrapolation error may occur. • It is necessary to regularize the extrapolation. 4 Frequency [kHz] Frequency Separated cluster Extrapolation error (incorrectly modifying the activation) Time Almost unseen frame 3 2 1 0 0 1 2 3 Time [s] 4 21

22.

Regularized superresolution-based NMF • We propose two types of regularizations. Regularization of the temporal continuity Previous frame Regularization of the norm minimization 𝑰 : Index matrix 𝑖𝜔,𝑡 : Entry of index matrix 𝑰 𝑓𝜔,𝑘 : Entry of matrix 𝑭 𝑔𝑘,𝑡 : Entry of matrix 𝑮 ∙ҧ : Binary complement The intensity of these regularizations are proportional to the number of chasms in each frame. 22

23.

Regularized superresolution-based NMF • The cost function in regularized superresolution-based NMF is defined using the index matrix as : Regularization term : Weighting parameter : Penalty term to force and to become uncorrelated with each other 23

24.

Regularized superresolution-based NMF • The update rules that minimize the cost function are obtained as follows: 24

25.

Outline • 1. Research background • 2. Conventional method – – – – Nonnegative matrix factorization Penalized supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Regularized superresolution-based nonnegative matrix factorization • 4. Experiments • 5. Conclusions 25

26.

Evaluation experiment • We compared four methods. – Conventional hybrid method using PSNMF (Conventional method) – Proposed hybrid method using superresolution-based NMF without regularization (Proposed method 1) – Proposed hybrid method using superresolution-based NMF with regularization of the temporal continuity (Proposed method 2) – Proposed hybrid method using superresolution-based NMF with regularization of the norm minimization (Proposed method 3) Input stereo signal Input stereo signal L-ch L-ch R-ch R-ch STFT STFT Directional clustering Directional clustering Center component Index of center cluster Center component L-ch PSNMF R-ch PSNMF L-ch Superresolutionbased SNMF R-ch Superresolutionbased SNMF ISTFT ISTFT ISTFT ISTFT Mixing Mixing Extracted signal Extracted signal 26

27.

Evaluation experiment • We used stereo-panning signals ( ) and binauralrecorded signals ( ) containing four instruments, Ob., Fl., Tb., and Pf., generated by MIDI synthesizer. • The sources are mixed as the same power. • Target source is always located in the center direction (no.1). • We used the same type of MIDI sounds of the target instruments as supervision for training process. Target source Left Center ２ ４ １ Right ３ Supervision sound Two octave notes that cover all notes of the target signal 27

28.

Experimental results (panning signal) • Average SDR, SIR, and SAR scores for each method, where the 4 instruments are shuffled with 12 combinations. SDR ：quality of the separated target sound SIR ：degree of separation between the target and other sounds SAR ：absence of artificial distortion Bad SDR 24 10 20 8 16 6 4 12 8 SIR 10 SAR 8 SAR [dB] 12 SIR [dB] SDR [dB] Good 6 4 2 4 2 0 0 0 Proposed method 1 ：no regularization Proposed method 2 ：regularization of temporal continuity Proposed method 3 ：regularization of norm minimization 28

29.

Experimental results (binaural signal) • Average SDR, SIR, and SAR scores for each method, where the 4 instruments are shuffled with 12 combinations. SDR ：quality of the separated target sound SIR ：degree of separation between the target and other sounds SAR ：absence of artificial distortion 10 SDR 20 SIR [dB] SDR [dB] Bad 4 6 12 8 SAR 5 16 8 6 SIR SAR [dB] Good 4 3 2 2 4 1 0 0 0 Proposed method 1 ：no regularization Proposed method 2 ：regularization of temporal continuity Proposed method 3 ：regularization of norm minimization 29

30.

Conclusions • We propose a new supervised NMF algorithm, which is superresolution-based method, for the hybrid method to separate stereo or binaural signals. • The proposed hybrid method can separate the target signal with high performance compared with conventional method. • The regularization of norm minimization is effective for the proposed supervised NMF algorithm. Thank you for your attention! 30