[DL輪読会]A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

>100 Views

October 26, 18

#deep learning #Deep Learning #Latent Vector Model #Music Structure #ICML 2018 #Google Brain

スライド概要

2018/10/26
Deep Learning JP:
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 89.3K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 63.9K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 60.6K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 45K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 44K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 42.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] “A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music (ICML2018)” Naoki Nonaka http://deeplearning.jp/ 1

http://deeplearning.jp/

書誌情報 • 会議：ICML 2018 • 著者：Adam Robertsら（Google Brain） • 引⽤：13件（2018/10/24時点） • 著者実装： https://github.com/tensorflow/magenta/tree/master/magenta/models/music_vae （図表は紹介する論⽂中のものを使⽤） 2018/10/24 2

https://github.com/tensorflow/magenta/tree/master/magenta/models/music_vae

背景深層⽣成モデルの創作活動への応⽤を考える求められること 1.潜在空間での補間ができること 2.ベクトルによる算術ができること 2018/10/25 3

背景 https://www.youtube.com/watch?v=G5JT16flZwM 2018/10/25 4

https://www.youtube.com/watch?v=G5JT16flZwM

背景深層⽣成モデルの創作活動への応⽤を考える⼀つの潜在変数zからサンプリングできるVAEに注⽬時系列データに対する適⽤を考える 2018/10/25 5

背景問題点：短い時系列ではできたが⻑い系列ではうまくいかなかった解決策：階層的なDecoderを⽤いる（⾳楽データによる検証）モデルに求められること 1.系列のモデル化ができること 2.潜在空間での補間ができること 3.ベクトルによる算術ができること 2018/10/25 6

関連研究 • Encoder-DecoderモデルによるVAE Bowman, Samuel R., et al. "Generating sentences from a continuous space." arXiv preprint arXiv:1511.06349 (2015). • その他 • DecoderにRNNを使うモデル • Wavenet • 確率的な遷移を扱うモデル 2018/10/25 7

Model • Bidirectional Encoder • Hierarchical Decoder 2018/10/24 8

Model • Bidirectional Encoder Bidirectional LSTMを⽤いて，潜在変数zを得る 2018/10/25 9

10.

Model • Hierarchical Decoder 2018/10/25 10

11.

Model • Hierarchical Decoder ⼊⼒系列xがU個の重複のない部分配列に分割できると考える (各部分配列は同⼀の数からなる) 潜在変数zからU個のembedding vector cを得る 2018/10/25 11

12.

Model • Hierarchical Decoder 1. cをDecoder(RNN)の初期状態として与える 2. cと⼀つ前の出⼒を⼊⼒として，RNNに与え，出⼒を得る 3. 部分配列の⻑さ分の出⼒を得る 2018/10/25 12

13.

Model • Hierarchical Decoder 系列全体の⽣成が終了するまで， cから部分配列を⽣成することを繰り返す 2018/10/25 13

14.

Model • Hierarchical Decoder 系列全体の⽣成が終了するまで， cから部分配列を⽣成することを繰り返す 2018/10/25 14

15.

Model • Bidirectional Encoder • Hierarchical Decoder 2018/10/25 15

16.

Data Web上から150万件のMIDIファイルを収集し，条件に合致するデータを抽出 • 2-bar melody • 16-bar melody • 2-bar drum • 16-bar drum • 16-bar Trio (Melody, Bass, Drum) 2018/10/25 16

17.

検証 • （短い系列での確認） • 再構成誤差の⽐較 • 補間実験 • 属性ベクトルの評価 • ⼈間によるテスト 2018/10/24 17

18.

Flat model (Encoder-Decoder VAE)で⾳楽の系列データをモデリングできるか確認 -> ⾼い精度で再構成できている 2018/10/24 18

19.

Teacher-Forcing (Next step prediction), Samplingで再構成誤差を評価 -> Teacher-ForcingとSamplingの差が⼤きければ，Posterior Collapse 2018/10/26 19

20.

c Flat decoder 2-bar: 差は⼩さい (3 – 6%) 16-bar: 差が⼤きい (27 – 32%) -> Flat decoderで，Posterior Collapseが起きている 2018/10/26 20

21.

Hierarchical decoder Teacher-ForcingとSamplingの差が改善（5 – 11%） 2018/10/26 21

22.

Trioにおいても，同様の傾向（FlatよりHierarchicalで差が⼩さくなる）提案⼿法でPosterior Collapseが軽減されている 2018/10/26 22

23.

• 滑らかに遷移するか • Semanticに意味があるかを補間により検証 2018/10/24 23

24.

• 滑らかに遷移するか • Semanticに意味があるかを補間により検証 Hamming距離の推移⾔語モデルでの損失の相対値の2つで検証 2018/10/26 24

25.

• 滑らかに遷移するか • Semanticに意味があるかを補間により検証 Hamming距離の推移⾔語モデルでの損失の相対値の2つで検証 HierarchicalはFlatと⽐較してDataに近い推移 -> より滑らかに推移している 2018/10/26 25

26.

• 滑らかに遷移するか • Semanticに意味があるかを補間により検証 Hamming距離の推移⾔語モデルでの損失の相対値の2つで検証 Hierarchicalでは，⾔語モデルでの損失が⼩ -> ⾳楽らしいものができている 2018/10/26 26

27.

⾃動で計算できる5つの属性に着⽬し，ベクトル演算が可能か検証 - C Diatonic - ⾳符の密度 - Intervalの平均 - 16分⾳符シンコペーション - 8分⾳符シンコペーション⾜し算（左），引き算（右）のベクトル 2018/10/26 27

28.

C Diatonic • 中央：元データ • 上：⾜し算 • 下：引き算 2018/10/26 16分⾳符シンコペーション • 中央：元データ • 上：⾜し算 • 下：引き算 28

29.

⼈間によるテスト Flat/Hierarchical(提案⼿法)/Realの中から最も良いものを選択選択された回数を⽐較（全部で192回） 2018/10/24 29

30.

結論 • Decoderを階層化したことにより， – ⻑い系列をモデル化できるようになった – Posterior collapseを防げるようになった • 階層化したモデルで，定量的・定性的に質の⾼い⾳楽を⽣成できるようになった 2018/10/24 30

31.

2018/10/24 31