Blind source separation based on independent low-rank matrix analysis and its extensions

1.

Ohio State University Visiting December 15th, 2017 Blind source separation based on independent low-rank matrix analysis and its extensions The University of Tokyo, Japan Project Research Associate Daichi Kitamura

2.

Self introduction • Name: Daichi Kitamura • Age: 27 (born in 1990) Japan – Born in Kagawa in Japan • Background: – NAIST, Japan Tokyo (Univ. Tokyo) • Master degree (received in 2014) – SOKENDAI, Japan • Ph.D. degree (received in 2017) – The University of Tokyo, Japan • Project Research Associate • Research topics Kagawa (place of birth) – Acoustic signal processing, statistical signal processing, audio source separation, etc. 2

3.

Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – – – – Employ low-rank TF structures of each source in BSS Gaussian source model with TF-varying variance Relationship between ILRMA and multichannel NMF Theoretical extension of ILRMA for better optimization • Conclusion 3

4.

Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – – – – Employ low-rank TF structures of each source in BSS Gaussian source model with TF-varying variance Relationship between ILRMA and multichannel NMF Theoretical extension of ILRMA for better optimization • Conclusion 4

5.

Background • Blind source separation (BSS) for audio signals BSS Recording mixture Separated guitar – separates original audio sources – does not require prior information of recording conditions • locations of mics and sources, room geometry, timbres, etc. – can be available for many audio app. • Consider only “determined” situation # of mics # of sources Sources Observed Mixing system Estimated Demixing system 5

6.

History of BSS for audio signals • Basic theories and their evolution 1994 Independent component analysis (ICA) 1998 Frequency-domain ICA (FDICA) Year 1999 2006 Many permutation solvers for FDICA Independent vector analysis (IVA) 2009 Auxiliary-function-based IVA (AuxIVA) 2012 Time-varying Gaussian IVA 2016 Nonnegative matrix factorization (NMF) Apply NMF to many tasks Generative models in NMF Many extensions of NMF Itakura–Saito NMF (ISNMF) 2011 2013 *Depicting only popular methods Multichannel NMF Independent low-rank matrix analysis (ILRMA) 6

7.

Motivation of ILRMA • Conventional BSS techniques based on ICA – ☺ Minimum distortion (linear demixing) Frequency-wise mixing matrix Source signals Observed signal : frequency bins : time frames Frequency-wise demixing matrix Estimated signal – ☺ Relatively fast and stable optimization • FastICA [A. Hyvarinen, 1999], natural gradient [S. Amari, 1996], and auxiliary function technique [N. Ono+, 2010], [N. Ono, 2011] –  Could not use “specific” assumption of sources • Only assumes non-Gaussian p.d.f. for sources –  Permutation problem is crucial and still difficult to solve • IVA often fails causing a “block permutation problem” [Y. Liang+, 2012] • Better to use a “specific source model” in TF domain – Independent low-rank matrix analysis (ILRMA) employs a low-rank property 7

8.

Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – – – – Employ low-rank TF structures of each source in BSS Gaussian source model with TF-varying variance Relationship between ILRMA and multichannel NMF Theoretical extension of ILRMA for better optimization • Conclusion 8

9.

Related methods: ICA • Independent component analysis (ICA) [P. Comon, 1994] – estimates without knowing Sources Source model Mixing matrix Observed Demixing Estimated matrix Spatial model – Source model (scalar) • is non-Gaussian and mutually independent – Spatial model • Mixing system is a time-invariant matrix • Mixing system in audio signals – Convolutive mixture with room reverberation 9

10.

Related methods: FDICA • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] – estimates frequency-wise demixing matrix Spectrograms – Source model (scalar) • Frequency-wise mixing matrix is time-invariant … – Spatial model ICA1 ICA2 … is complex-valued, non-Gaussian, and mutually independent Frequency bin • ICA I Time frame – Instantaneous mixture in each frequency band – A.k.a. rank-1 spatial model [N.Q.K. Duong, 2010] • Permutation problem? – Order of estimated signals cannot be determined by ICA – Alignment of frequency-wise estimated signals is required • Many permutation solvers were proposed 10

11.

Permutation problem • FDICA requires signal alignment for all frequency – Order of estimated signals cannot be determined by ICA* Estimated signal 1 Source 1 Observed 1 ICA Time Source 2 Permutation Solver Estimated signal 2 Observed 2 All frequency components *Signal scale also must be restored by applying a back-projection technique 11

12.

Related methods: IVA • Independent vector analysis (IVA) [A. Hiroe, 2006], [T. Kim, 2006] – extends ICA to multivariate probabilistic model to consider sourcewise frequency vector as a vector variable Source vector Multivariate nonGaussian dist. … … Permutation-free estimation of … … … Have higher-order correlations Observed vector Estimated vector Mixing matrix Demixing matrix is achieved! – Source model (vector) • is multivariate, spherical, complex-valued, non-Gaussian, and mutually independent – Spatial model • Mixing system is a time-invariant matrix (rank-1 spatial model) 12

13.

Higher-order correlation assumed in IVA • Spherical multivariate distribution [T. Kim+, 2007] Mutually independent two Laplace dist.s x1 and x2 are mutually independent Probability depends on only the norm Spherical Laplace dist. x1 and x2 have higher-order correlation • Why spherical distribution? – Frequency bands that have similar activations will be merged together as one source avoid permutation problem 13

14.

Comparison of source models • Frequency-domain ICA (FDICA) [P. Smaragdis, 1998] Scalar r.v.s Frequency Demixing matrix Source obeys nonGaussian dist. Estimated STFT Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Current empirical dist. Time Non-Gaussian source dist. Mutually independent Frequency Mixture is close to Gaussian signal because of CLT Observed Time • Independent vector analysis (IVA) [A. Hiroe, 2006], [T. Kim, 2006] Estimated Demixing matrix STFT Update separation filter so that the estimated signals obey non-Gaussian distribution we assumed Time Frequency Observed Frequency Vector (multivariate) r.v.s Time Current empirical dist. Non-Gaussian spherical source dist. Mutually independent 14

15.

Related method: NMF • Nonnegative matrix factorization (NMF) [D. D. Lee, 1999] – Low-rank decomposition with nonnegative constraint • Limited number of nonnegative bases and their coefficients – Spectrogram is decomposed in acoustic signal processing • Frequently appearing spectral patterns and their activations Amplitude Basis matrix Activation matrix (spectral patterns) (time-varying gains) Frequency Frequency Nonnegative matrix (power spectrogram) Time Time Amplitude : # of freq. bins : # of time frames : # of bases 15

16.

Related method: ISNMF • ISNMF [C. Févotte, 2009] Equivalent Circularly symmetric complex Gaussian dist. Complex-valued observed signal Nonnegative variance – can be decomposed using “stable property” of • If we define , Variance is also decomposed! 16

17.

Related method: ISNMF Small value of power Frequency bin • Power spectrogram corresponds to variances in TF plane : Power spectrogram Grayscale shows the value of variance Time frame Large value of power Complex Gaussian distribution with TF-varying variance If we marginalize in terms of time or frequency, the distribution becomes non-Gaussian even though each TF grid is defined in Gaussian distribution 17

18.

Comparison of low-rankness Drums Guitar Vocals Speech 18

19.

Comparison of low-rankness • Low-rankness (simplicity of a matrix) – can be measured by a cumulative singular value (CSV) 95% line 7 29 Around 90 Number of bases when CSV reaches 95% （Spectrogram size is 1025 x1883） – Drums and guitar are quite low-rank • Also, vocals and speech are to some extent low-rank – Music spectrogram can be modeled by only few patterns 19

20.

Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – – – – Employ low-rank TF structures of each source in BSS Gaussian source model with TF-varying variance Relationship between ILRMA and multichannel NMF Theoretical extension of ILRMA for better optimization • Conclusion 20

21.

Extension of source model in IVA – has a frequency-uniform scale • Spherical multivariate Laplace • Higher-order correlation among frequency Frequency • Source model in IVA – Equivalent to NMF with one flat basis Replace the source model assumed in ICA or IVA Time – NMF with arbitrary number of bases • can represent complicated TF structures – can learn “co-occurrence” structure in TF domain for each source • Low-rank co-occurrence is captured as the variance – The source-wise structure can be estimated by ISNMF Frequency • Source model in ISNMF [C. Févotte+, 2009] Time 21

22.

Extension of source model in IVA • Source model in IVA Frequency vector (I-dimension) Spherical Laplace dist. (bivariate case) Frequency-uniform scale Replace the source model assumed in ICA or IVA • Source model in ISNMF [C. Févotte+, 2009] Zero-mean complex Gaussian in each TF bin Time-frequency matrix (IJ-dimensional) Low-rank decomposition with NMF Time-frequency-varying variance 22

23.

Cost function in ILRMA and partitioning function • Negative log-likelihood in ILRMA Replaced from IVA model to ISNMF model Estimated signal: Cost function in ICA (estimates demixing matrix) Update rules in ICA Update rules in ISNMF Cost function in ISNMF (estimates low-rank source model) All the variables can easily be optimized by an alternative update 23

24.

Update rules of ILRMA • ML-based iterative update rules – Update rule for is based on iterative projection [N. Ono, 2011] – Update rules for NMF variables is based on MM algorithm Spatial model (demixing matrix) Source model (NMF source model) where and is a one-hot vector that has 1 at th element – Pseudo code is available at • http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf 24

http://d-kitamura.net/pdf/misc/AlgorithmsForIndependentLowRankMatrixAnalysis.pdf

25.

Optimization process in ILRMA • Demixing matrix and source model are alternatively updated Estimating demixing matrix Estimating Source model NMF variables NMF Update NMF Mixture Separated – The precise modeling of low-rank TF structures will improve the estimation accuracy of demixing matrix 25

26.

Comparison of source models FDICA source model Non-Gaussian scalar variable IVA source model Non-Gaussian vector variable with higher-order correlation ILRMA source model Non-Gaussian matrix variable with low-rank time-frequency structure Rank of TF matrix of mixture Rank of TF matrix of each source 26

27.

Multichannel extension of NMF • Multichannel NMF [A. Ozerov+, 2010], [H. Sawada+, 2013] Multichannel vector Spatial covariances in Spatial covariances each time-frequency slot of each source Instantaneous spatial covariance Partitioning function Basis matrix Activation matrix Gains Spectral patterns Observed multichannel signal Spatial model Source model Spatial property of each source Timber patterns of all sources 27

28.

Relationship b/w ILRMA and multichannel NMF • Difference b/w ILRMA and multichannel NMF? – Source distribution: complex Gaussian distribution (same) – ILRMA assumes – Multichannel NMF assumes full-rank spatial covariance • Assumption: rank-1 spatial model – Spatial covariance of each source is rank-1 matrix Sourcewise steering vector – Equivalent to simultaneous mixing assumption , 28

29.

Relationship b/w ILRMA and multichannel NMF • Multichannel NMF with rank-1 spatial model Substitute into the cost function Transform the variables as 30

30.

Relationship b/w MNMF, IVA, and ILRMA • From multichannel NMF side, – Rank-1 spatial model is introduced, transform the problem from the estimation of mixing system to that of demixing matrix • From IVA side, Flexible Multichannel NMF Rank-1 spatial model Limited Spatial model – Increase the number of spectral bases in source model IVA NMF source model ILRMA Limited Source model Flexible 31

31.

Experimental evaluation • Conditions Source signals Window length Shift length Number of bases Evaluation score Music signals obtained from SiSEC Convolve impulse response, two microphones and two sources 512 ms of Hamming window 128 ms (1/4 shift) 30 per each source Improvement ot signal-to-distortion ratio (SDR) Source 1 Source 2 2m 50 Impulse response E2A (reverberation time: 300 ms) 50 5.66 cm 32

32.

Result example • Ultimate NZ tour (Guitar and Synthesizer, 14s) SDR improvement [dB] Good 20 Guitar Synth. 15 10 Poor 5 0 IVA Multichannel NMF ILRMA 33

33.

Results: bearlin-roads • Ultimate NZ tour (Guitar and Synthesizer, 14s) Good 12 15.1 s SDR improvement [dB] 10 60.7 s 8 11.5 s 7647.3 s 6 4 IVA MNMF ILRMA without Z ILRMA with Z 2 0 Poor -2 0 100 200 Iteration steps 300 400 34

34.

Subjective evaluation • Thurston’s pairwise comparison – Speech separation and music separation tasks – 10 males and 4 females 1.6 Subjective score 1.2 Speech signals Music signals 0.8 0.4 0.0 -0.4 -0.8 -1.2 IVA Multichannel NMF ILRMA 35

35.

Demonstration: music source separation • Music source separation Keyboard Guitar Source separation Vocal Vocal Keyboard Pay attention to listen three parts in the mixture Guitar Another demo is available at http://d-kitamura.net/en/index_en.html 36

http://d-kitamura.net/en/index_en.html

36.

Best optimization balance? • “Alternating update” of spatial model (ICA) and source model (NMF) is used in ILRMA ICA (demixing matrix) Identity and Randomized NMF (low-rank source model) NMF update ICA update – Sometimes the optimization in ILRMA is trapped into a poor solution (local minimum) • There may be exists the best optimization balance b/w ICA and NMF models to avoid local minima 37

37.

Controlling optimization speed • How to control the optimization speed ensuring the convergence of algorithm? – Parametric majorization-equalization (ME) algorithm – Apply parametric ME to NMF optimization to find the best balance between ICA and NMF Identity and Randomized NMF update ICA update Becomes controllable by parametric ME • Find the best balance of optimization speeds between NMF and ICA 38

38.

Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 39

39.

Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-minimization (MM) algorithm [D. R. Hunter+, 2000] 40

40.

Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] 41

41.

Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Majorization-equalization (ME) algorithm [C. Févotte+, 2011] Fast Slow 42

42.

Majorization-based optimization algorithm • NMF optimization is based on a majorizer-based algorithm (a.k.a. auxiliary function technique) – Parametric ME algorithm [Y. Mitsui+, 2017] 43

43.

Parametric-ME-based NMF optimization • Comparison of NMF update rules – Update rules of basis matrix MM algorithm ME algorithm Parametric ME algorithm – Only the exponent is different – Optimization speed of NMF model can be controlled by 44

44.

Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds ultimate_nz_tour Slow Fast 45

45.

Parametric-ME-based ILRMA • ILRMA of 2000 trials with various random seeds another_dreamer-the_ones_we_love Slow Fast 46

46.

Parametric-ME-based ILRMA • Slower NMF optimization (small value of provide better results in ILRMA ) tends to – But, why? We don’t know! • Conjecture – In the beginning of ILRMA, NMF model is “random” • Not believable – The demixing matrix can be updated without source model to some extent (because even IVA works well) • Statistical independence between sources is very powerful Initialization Independencebased separation Precise modeling of source structure Updated Updated Updated Slowly updated Slowly updated Updated Improved separation 47

47.

Contents • Background – Blind source separation (BSS) for audio signals – Motivation • Preliminaries – Frequency-domain independent component analysis (FDICA) – Independent vector analysis (IVA) – Itakura–Saito nonnegative matrix factorization (ISNMF) • Independent Low-Rank Matrix Analysis (ILRMA) – – – – Employ low-rank TF structures of each source in BSS Gaussian source model with TF-varying variance Relationship between ILRMA and multichannel NMF Theoretical extension of ILRMA for better optimization • Conclusion 48

48.

Conclusion • Independent low-rank matrix analysis (ILRMA) – Permutation-free ICA-based blind source separation – Assumption • Statistical independence between sources • Low-rank time-frequency structure of each source – Equivalent to multichannel NMF • when the mixing assumption is valid • On going works! – – – – Relaxation of rank-1 spatial model Extension of source generative model Semi/full-supervised ILRMA, user-guided ILRMA and, collaboration of deep neural network… • Independent deeply learned matrix analysis (IDLMA) • Maybe submitted at next EUSIPCO…? 49

49.

Conclusion • Independent low-rank matrix analysis (ILRMA) – will be published from Springer in March, 2018! Audio Source Separation (Signals and Communication Technology) 1st ed. 2018 Edition by Shoji Makino (Editor) Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, and Hiroshi Saruwatari, "Determined blind source separation with independent lowrank matrix analysis“ Search in Amazon.com! 50

50.

Conclusion • Independent low-rank matrix analysis (ILRMA) – will be presented in ICASSP 2018 as a tutorial session! • Title (tentative): Blind Audio Source Separation on Tensor Representation – Presenters: Hiroshi Sawada, Nobutaka Ono, Hirokazu Kameoka, Daichi Kitamura Thank you so much for your attention! 51

Blind source separation based on independent low-rank matrix analysis and its extensions

Daichi Kitamura

関連スライド

音源分離技術の基礎と応用～音源分離ﾁｮｯﾄﾜｶﾙになるための手引き～

音源分離における音響モデリング（Acoustic modeling in audio source separation）

Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法（Jupyter notebookも）

独立低ランク行列分析に基づく音源分離とその発展（Audio source separation based on independent low-rank matrix analysis and its extensions）

独立低ランク行列分析に基づくブラインド音源分離（Blind source separation based on independent low-rank matrix analysis）

音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sound media signal processing and its applications

各ページのテキスト