【DL輪読会】Tracking Everything Everywhere All at Once

3.6K Views

October 20, 23

#Deep Learning #Takahiro Maeda #OmniMotion #Tracking #NeRF

スライド概要

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 90.6K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 67.1K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.1K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 48.9K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 47.1K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 46.4K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Tracking Everything Everywhere All at Once Presenter: Takahiro Maeda D3 (Toyota Technological Institute) http://deeplearning.jp/

http://deeplearning.jp/

目次 1. 2. 3. 4. 5. 6. 書誌情報概要研究背景提案手法実験結果考察・所感 2

1. 書誌情報 ICCV 2023, Best Student Paper, Project Page ※特に明示が無い場合は，紹介論文，動画から引用 3

https://omnimotion.github.io/

2. 概要 • 長いスパンでdenseなtrackingを推定するOmniMotionを提案 4

3. 研究背景 Tracking • 長時間, 遮蔽に強い • Sparse Learning to Track: Online Multi-Object Tracking by Decision Making, Yu Xiang et. al. ICCV’15 Optical Flow • 短時間, 遮蔽に弱い • dense https://medium.com/building-autonomous-flightsoftware/math-behind-optical-flow-1c38a25b1fe8 長時間, 遮蔽に強い, denseなtrackingが欲しい 5

https://medium.com/building-autonomous-flight-

4. 提案手法 (1/3) • 全フレームを1つの基準空間へ対応付けることによりtrackingを実現 – 各フレーム固有の可逆変換𝒯𝑖 によって対応付け – NeRFの学習スキームを使用 – 基準空間上で，密度𝜎と色𝒄を推定 6

4. 提案手法 (2/3) • Trackingの実現方法 – 対象のpixelを可逆変換を目的フレームまでたどる – 対象のpixel 𝒑𝑖 の光線上の点群𝒙𝑖 の写像𝒙𝑗 を密度𝜎に従って足し合わせるෝ 𝒙𝑗 ただし， ෝ𝑗 を得る –ෝ 𝒙𝑗 を投影して𝒑 7

4. 提案手法 (3/3) • 詳しい学習方法 – 損失関数 – 既存のoptical flowとの一致 – 画像の再構成誤差 – 滑らかさ制約 • 20~30秒の動画で学習時間が3,4時間（著者談） 8

5. 実験結果 (1/4) 9

10.

5. 実験結果 (2/4) • Occlusionにも頑健 10

11.

5. 実験結果 (3/4) • Interactive demo あり – https://omnimotion.github.io/ 11

https://omnimotion.github.io/

12.

5. 実験結果 (4/4) • Limitation – 薄い物体や，高速な非剛体の動きは難しい • 学習を補助するOptical Flowが崩壊するため 12

13.

6. 考察・所感 • 考察 – かなり長期間でdenseなtrackingができるため，応用研究がたくさん出そう – カメラと物体それぞれの動きを分離しているわけではない – 可逆変換によって基準空間へ対応付けているため，全フレームと基準空間が同型写像（isomorphism）となる． • この仮定が当てはまらないような動画には適用不可 • Invertibleなモデルへの知見を活用できる – NeRFのような構造をもっているため，NeRFの高速化を活用可能 13