Lecture 17: Efficient GAN, Video, and Point Cloud [TinyML]

Lecture 17: Efficient GAN, Video, and Point Cloud Ryota Murai AI Vision Lab. Tokyo Polytechnic University

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 2

3.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 3

4.

生成モデルの基本概念近づける学習データの分布生成データの分布学習データ学習猫生成モデル生成 https://www.pakutaso.com/animal/cat/index.html MIT 6.5940: TinyML and Efficient Deep Learning Computing 4

https://www.pakutaso.com/animal/cat/index.html

5.

GANの基本(Goodfellow et al., 2014) Generative Adversarial Networks(敵対的生成ネットワーク) Discriminator(識別器)は本物と偽物を判定できるように学習を進め、 Generatorは識別器を騙せるような画像を生成するように学習を進める https://sthalles.github.io/intro-to-gans/ https://developers.google.com/machine-learning/gan/gan_structure MIT 6.5940: TinyML and Efficient Deep Learning Computing 5

6.

GANの基本例: MNISTデータセットを使用して手書き数字画像の生成を学習する初期化(ランダムノイズ) 学習中の生成器サンプル https://sthalles.github.io/intro-to-gans/ MIT 6.5940: TinyML and Efficient Deep Learning Computing 6

https://sthalles.github.io/intro-to-gans/

7.

GANの基本: Conditional vs. unconditional • Unconditional(無条件) GAN: ランダムノイズを使用して生成する • Conditional(条件付き) GAN: ラベルが与えられる(生成に制御性を追加する) • クラスラベル/セグメンテーションマップ/ストローク/など… CycleGAN(Zhu et al., ICCV 2017) GauGAN(NVIDIA, 2022) https://junyanz.github.io/CycleGAN/ https://www.youtube.com/watch?v=p5U4NgVGAwg MIT 6.5940: TinyML and Efficient Deep Learning Computing 7

8.

GANの課題: 計算コストが高い • 生成モデルは識別モデルよりも遥かに計算コストが高い (積和演算回数) GAN Compression: Efficient Architectures for Interactive Conditional GANs [Li et al., CVPR 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 8

9.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 9

10.

GANの圧縮 Conditional GANsの圧縮 1. 知識蒸留 • 教師𝐺’(𝑥)と生徒𝐺(𝑥)の出力を近づけるように学習 • 中間層の特徴マップも近づけるように損失を加える ① ②,③ 2. より小さくて高性能なチャネル幅の組合せを探索する 3. 最良構成のFine-Tuning GAN Compression: Efficient Architectures for Interactive Conditional GANs [Li et al., CVPR 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 10

11.

GANの圧縮 Conditional GANsの圧縮 GAN Compression: Efficient Architectures for Interactive Conditional GANs [Li et al., CVPR 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 11

12.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 12

13.

AnyCost GANs:動機 GANは画像編集に利用できるが、生成速度が遅いため、インタラクティブなアプリケーションには適さない潜在空間内の値を変更することで画像生成を制御する Anycost gans for interactive image synthesis and editing [Lin et al., 2021] MIT 6.5940: TinyML and Efficient Deep Learning Computing 13

14.

AnyCost GANs:動機レイトレーシングでは計算する光の量を減らすことで高速にプレビューしているこのようなアプローチをGANにも適用できないか？ MIT 6.5940: TinyML and Efficient Deep Learning Computing 14

15.

AnyCost GANs:動機レイトレーシングでは計算する光の量を減らすことで高速にプレビューしているこのようなアプローチをGANにも適用できないか？プレビュー時には軽量なモデル最終的には完全なモデル Anycost gans for interactive image synthesis and editing [Lin et al., 2021] MIT 6.5940: TinyML and Efficient Deep Learning Computing 15

16.

AnyCost GANs モデルを訓練し、異なる解像度・異なるチャンネル数で一貫した出力が得られるようにする Anycost gans for interactive image synthesis and editing [Lin et al., CVPR2021] MIT 6.5940: TinyML and Efficient Deep Learning Computing 16

17.

AnyCost GANs モデルを訓練し、異なる解像度・異なるチャンネル数で一貫した出力が得られるようにする単一の解像度をランダムにサンプリング Anycost gans for interactive image synthesis and editing [Lin et al., CVPR2021] MIT 6.5940: TinyML and Efficient Deep Learning Computing 17

18.

AnyCost GANs モデルを訓練し、異なる解像度・異なるチャンネル数で一貫した出力が得られるようにする Anycost gans for interactive image synthesis and editing [Lin et al., CVPR2021] MIT 6.5940: TinyML and Efficient Deep Learning Computing 18

19.

AnyCost GANs モデルを訓練し、異なる解像度・異なるチャンネル数で一貫した出力が得られるようにするチャンネル構成をOne-hotエンコーディングし全結合層で処理 Anycost gans for interactive image synthesis and editing [Lin et al., CVPR2021] https://hanlab.mit.edu/projects/anycost-gan MIT 6.5940: TinyML and Efficient Deep Learning Computing 19

https://hanlab.mit.edu/projects/anycost-gan

20.

AnyCost GANs Anycost gans for interactive image synthesis and editing [Lin et al., CVPR2021] https://hanlab.mit.edu/projects/anycost-gan MIT 6.5940: TinyML and Efficient Deep Learning Computing 20

https://hanlab.mit.edu/projects/anycost-gan

21.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 21

22.

データ効率の高いGAN 大規模データの構築は非常に大変限られた学習データではGANは大きく劣化する Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 22

23.

データ効率の高いGAN • 識別器はデータが少ないほどオーバフィッティングし、生成器の性能を悪化させている Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 23

24.

データ効率の高いGAN • 識別器はデータが少ないほどオーバフィッティングし、生成器の性能を悪化させている • 一般的に、オーバフィッティングにはデータ拡張が有効である • GANの文脈ではどのように行えばよいか？ Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 24

25.

微分可能な拡張 • 方法1: 本物の画像のみを拡張する • 問題点：適用した拡張の影響が生成画像に現れる • 生成器「Color jitterを模倣したほうが本物だと認識してくれるっぽいな」 Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 25

26.

微分可能な拡張 • 方法2：識別器の入力のみを拡張する • 問題点：精度が不均衡になり訓練を難しくする • 生成器が生成した画像と識別器が見ている画像は異なるので、識別器に対抗する方法が分からない Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 26

27.

微分可能な拡張 • 方法3：Differentiable Augmentation • 拡張を、判別器と生成器の両方のforward/backwardに適用 Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 27

28.

微分可能な拡張 • 方法3：Differentiable Augmentation • 特に少ないデータ数でのGANの改善がみられた Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 28

29.

微分可能な拡張 • 方法3：Differentiable Augmentation • 100枚程度でもオーバフィッティングしづらい Differentiable Augmentation for Data-Efficient GAN Training [Zhao et al., NeurIPS 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 29

30.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 30

31.

効率的な映像理解：背景映像の量は増え続けているクラウドやエッジデバイスでは高効率な映像処理が必須 MIT 6.5940: TinyML and Efficient Deep Learning Computing 31

32.

効率的な映像理解：背景 • 数多くの用途 • 行動認識 • シーン理解 • 動画検索 • 自動運転のための未来フレーム予測 • など… https://github.com/open-mmlab/mmaction2 MIT 6.5940: TinyML and Efficient Deep Learning Computing 32

https://github.com/open-mmlab/mmaction2

33.

効率的な映像理解：これまでの研究 (2D CNN-based) フレームごとにCNNで処理しそれぞれのスコアを集計する(平均や最大など) 代表例： TSN(Temporal Segment Network) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition [Wang et al. ECCV2016] MIT 6.5940: TinyML and Efficient Deep Learning Computing 33

34.

効率的な映像理解：これまでの研究 (2D CNN-based) Two-Stream CNNを使う手法フレーム画像とOptical Flowを 2系統に流し両者の結果を統合 Optical Flow: 物体の動きをベクトルで表現したもの計算は比較的遅い Two-Stream Convolutional Networks for Action Recognition in Videos [Simonyan et al., NeurIPS 2014] MIT 6.5940: TinyML and Efficient Deep Learning Computing 34

35.

効率的な映像理解：これまでの研究 (2D CNN-based) 2DCNN + Post-Fusion(LSTMなど) 低レベルな情報はCNN部分で失われやすい Long-term Recurrent Convolutional Networks for Visual Recognition and Description [Donahue et al., CVPR 2016] MIT 6.5940: TinyML and Efficient Deep Learning Computing 35

36.

効率的な映像理解：これまでの研究 (2D CNN-based) • 利点 • 計算コストは低い。画像認識モデルを再利用できる • 欠点 • CNNは時系列情報を持たない。映像ベンチマークの結果は良くない • Optical Flowの計算は遅い • Late Fusion型は低レベルな時間的な関係性をモデル化できない MIT 6.5940: TinyML and Efficient Deep Learning Computing 36

37.

効率的な映像理解：これまでの研究 (3D CNN-based) C3D(Convolutional 3D)：空間方向だけでなく、時間方向にも畳み込みを行う時空間的な情報を保持するようになる Learning Spatiotemporal Features with 3D Convolutional Networks [Tran et al. ICCV 2015] MIT 6.5940: TinyML and Efficient Deep Learning Computing 37

38.

効率的な映像理解：これまでの研究 (3D CNN-based) I3D(Inflated 3D ConvNet)： • 2DCNN(GoogLe Net)の重みで初期化し３Dに膨張(inflation)させる • Kineticsという大規模データセットで訓練する →２つの工夫で、より深い構造が可能に Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [Carreira et al. CVPR 2017] MIT 6.5940: TinyML and Efficient Deep Learning Computing 38

39.

効率的な映像理解：これまでの研究 (3D CNN-based) • 利点 • 時空間情報を統合してモデル化できる • 低～高レベルの情報をモデル化できる • 欠点 • モデルサイズや計算量が増大する MIT 6.5940: TinyML and Efficient Deep Learning Computing 39

40.

効率的な映像理解：これまでの研究 (3D CNN-based) • 利点 • 時空間情報を統合してモデル化できる • 低～高レベルの情報をモデル化できる • 欠点 • モデルサイズや計算量が増大する → 3DCNNの性能を2DCNNのコストで実現できないか？ MIT 6.5940: TinyML and Efficient Deep Learning Computing 40

41.

効率的な映像理解： Temporal Shift Module (TSN) • Offline TSM Model • 適用範囲：行動認識、落下検知、映像レコメンデーションなど • 前後の情報をモデル化する TSM: Temporal Shift Module for Efficient Video Understanding [Lin et al., ICCV 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 41

42.

効率的な映像理解： Temporal Shift Module (TSN) • Online TSM Model • 適用範囲：自動運転など • 過去→未来へ情報を一方向に伝播させる TSM: Temporal Shift Module for Efficient Video Understanding [Lin et al., ICCV 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 42

43.

効率的な映像理解： Temporal Shift Module (TSN) • ECO系より三倍、I3D系より最大6倍高速 • Something-Somethingデータセットでより良い精度を達成 TSM: Temporal Shift Module for Efficient Video Understanding [Lin et al., ICCV 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 43

44.

効率的な映像理解： Temporal Shift Module (TSN) • スループット向上により、リアルタイム推論が改善した TSM: Temporal Shift Module for Efficient Video Understanding [Lin et al., ICCV 2019] https://developer.nvidia.com/embedded/community/jetson-projects/tsm_online MIT 6.5940: TinyML and Efficient Deep Learning Computing 44

https://developer.nvidia.com/embedded/community/jetson-projects/tsm_online

45.

効率的な映像理解： Temporal Shift Module (TSN) スケールアップスケールダウン Training kinetics in 15 minutes: Large-scale distributed training on videos [Lin et al., 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 45

46.

効率的な映像理解： Temporal Shift Module (TSN) 低FLOPs, 低I/O時間による高速な訓練 Training kinetics in 15 minutes: Large-scale distributed training on videos [Lin et al., 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 46

47.

効率的な映像理解： Temporal Shift Module (TSN) 低FLOPs, 低I/O時間による高速な訓練 Training kinetics in 15 minutes: Large-scale distributed training on videos [Lin et al., 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 47

48.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 48

49.

3D センサー MIT 6.5940: TinyML and Efficient Deep Learning Computing 49

50.

3D Point Cloud 点群：三次元空間上の点の集まり https://github.com/alvinwan/pc2vid MIT 6.5940: TinyML and Efficient Deep Learning Computing 50

https://github.com/alvinwan/pc2vid

51.

3D Point Cloud 適用領域：自動運転、ARなど https://learnopencv.com/3d-lidar-object-detection/ https://www.youtube.com/watch?v=et34mpAxaJc MIT 6.5940: TinyML and Efficient Deep Learning Computing 51

52.

3D Point Cloud：課題 • 画像よりも非常にスパース • メモリに不規則に配置される • CNNなどでは処理できない • 自動運転など、リソース制限のあるエッジへのデプロイが求められる https://learnopencv.com/3d-lidar-object-detection/ https://www.youtube.com/watch?v=et34mpAxaJc MIT 6.5940: TinyML and Efficient Deep Learning Computing 52

53.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 53

54.

Point-Voxel CNN • 点群をそのまま入力として使う場合 • 幾何情報を直接扱え高精度だが局在的な空間構造の抽出が難しい • 点群を3次元グリッド（Voxel）に変換し、3D CNNで処理する場合 • 空間的な隣接関係をうまく扱えるがメモリコストが増大する →両者のいいとこ取りをしたい Point-Voxel CNN for Efficient 3D Deep Learning [Liu et al., NeurIPS 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 54

55.

Point-Voxel CNN Point-Voxel CNN for Efficient 3D Deep Learning [Liu et al., NeurIPS 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 55

56.

Point-Voxel CNN Point-Voxel CNN for Efficient 3D Deep Learning [Liu et al., NeurIPS 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 56

57.

Point-Voxel CNN Point-Voxel CNN for Efficient 3D Deep Learning [Liu et al., NeurIPS 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 57

58.

Point-Voxel CNN：限界情報の損失が発生している Point-Voxel CNN for Efficient 3D Deep Learning [Liu et al., NeurIPS 2019] MIT 6.5940: TinyML and Efficient Deep Learning Computing 58

59.

Sparse Point-Voxel Convolution (SPVConv) スパースに畳み込むことで情報損失を減らす Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution [Tang et al., ECCV 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 59

60.

Sparse Point-Voxel Convolution (SPVConv) スパースに畳み込むことで情報損失を減らす Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution [Tang et al., ECCV 2020] MIT 6.5940: TinyML and Efficient Deep Learning Computing 60

61.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 61

62.

多センサー化によるビューの不一致多ビュー情報の統合には情報の損失を最小限に抑えた共有空間が必要になる BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Liu et al., ICRA 2023] MIT 6.5940: TinyML and Efficient Deep Learning Computing 62

63.

Bird’s-Eye View (BEV) Fusion カメラの意味的密度とLiDARの幾何学的構造を保持する BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Liu et al., ICRA 2023] MIT 6.5940: TinyML and Efficient Deep Learning Computing 63

64.

Bird’s-Eye View (BEV) Fusion カメラの意味的密度とLiDARの幾何学的構造を保持する BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Liu et al., ICRA 2023] MIT 6.5940: TinyML and Efficient Deep Learning Computing 64

65.

Bird’s-Eye View (BEV) Fusion 複数種類のセンサーを用いて車両検知や車線分割など複数タスクに対応可能 BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [Liu et al., ICRA 2023] MIT 6.5940: TinyML and Efficient Deep Learning Computing 65

66.

今回の内容 • 効率的なGANs • GANの圧縮 • AnyCost GAN • データ効率に優れた、GANのための微分可能な拡張 • 効率的な映像理解 • TSM • 効率的な点群理解 • PVCNN / SPVCNN • BEVFusion MIT 6.5940: TinyML and Efficient Deep Learning Computing 66

Lecture 17: Efficient GAN, Video, and Point Cloud [TinyML]

Ryota Murai

関連スライド

入門 Git/GitHub

入門 Hugging Face

Lecture 6: Quantization (Part Ⅱ) [TinyML]

Lecture 3: Pruning and Sparsity (Part I) [TinyML]

Lecture2: Image Classification with Linear Classifiers [cs231n]

学振特別研究員になるために～2025年度申請版

各ページのテキスト