【DL輪読会】Masked World Models for Visual Control

1.3K Views

December 09, 22

#@deep learning jp #Deep Learning #Masked World Models #Masked Autoencoder #Visual Control #Reinforcement Learning

スライド概要

2022/12/9
Deep Learning JP
http://deeplearning.jp/seminar-2/

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 92.6K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 71.8K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 61.6K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 55.3K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 52.3K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 50.5K

各ページのテキスト

DEEP LEARNING JP [DL Papers] Masked World Models for Visual Control Koki Yamane, University of Tsukuba http://deeplearning.jp/ 1

http://deeplearning.jp/

書誌情報題名 Masked World Models for Visual Control 著者 Younggyo Seo (1,2), Danijar Hafner (2,3,4), Hao Liu (2), Fangchen Liu (2), Stephen James (2), Kimin Lee (3), Pieter Abbeel (2) 所属 (1) KAIST (2) UC Berkeley (3) Google Research (4) University of Toronto 会議 CoRL 2022 website https://sites.google.com/view/mwm-rl 概要 ⚫ 世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用 ⚫ 報酬の予測によりタスクに適した表現を獲得 2023/10/1 0 2

https://sites.google.com/view/mwm-rl

先行研究：世界モデル [Ha+ 2018] 環境のシミュレータを学習により獲得し高いサンプル効率で強化学習 ◼ Vision (V) Model  画像を潜在変数に圧縮  VAE，対照学習など ◼ Memory (M) Model  潜在変数の時間変化を学習  RNNで潜在変数の系列を記憶 ◼ Controller (C) Model  潜在変数から行動を予測  世界モデルが学習できれば方策は線形モデルで単純なモデル化が可能 2023/10/1 0 D. Ha and J. Schmidhuber. World models. In Advances in Neural Information Processing Systems, 2018. 3

背景：物体消失問題単純に再構成誤差でAEを学習してもタスクに適した表現は得られない ◼ 画像表現学習とタスクのギャップ  VAEのような再構成学習では面積の小さい要素は無視してもLossが下がってしまう  一方でタスクに必要なのは対象物体の位置などの一部の情報 ◼ 学習コストの問題  画像モデルと状態遷移モデルを同時に学習すると高次元データのRNNになり計算量が増大 2023/10/1 0 Okada, Masashi, and Tadahiro Taniguchi. "DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction." arXiv preprint arXiv:2203.00494 (2022). 4

先行研究： Masked Autoencoder (MAE) [He+ 2021] ViTをマスク復元タスクで事前学習 ◼ パッチに分割された画像の大部分（75%）をマスクしてViTに入力 ◼ 損失関数  マスクされたパッチの再構成誤差（MSE） ◼ 画像分類タスクで高精度を達成 2023/10/1 0 K. He, X. Chen, S. Xie, Y. Li, P. Dollar, and R. Girshick. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377, 2021. 5

提案手法： Masked World Models (MWM) 世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用画像直接ではなく中間層でマスキング（物体を消してしまうのを防ぐ？） 2023/10/1 0 再構成に加え報酬を予測（報酬にかかわる情報を重視させる） 6

実験 3つのシミュレーション環境で実験 Meta-world 2023/10/1 0 RLBench DeepMind Control Suite 7

結果性能・サンプル効率ともに従来手法（Dreamer V2）から改善小さな物体が重要なタスクでは差が顕著小さな物体のないタスクでは同等程度 2023/10/1 0 8

結果：Ablation Studies 75%の特徴量マスク＋報酬予測で最高性能画像直接ではなく特徴量のマスクで性能向上 2023/10/1 0 75%のマスクで最高性能報酬予測で性能向上 9

10.

結果：予測画像比較 Dreamer V2 と比較して MWM は物体の位置を予測できている提案手法では物体位置把握既存手法では物体消失 2023/10/1 0 10

11.

まとめ ◼ 世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用 ◼ 画像直接ではなく中間層でマスキング ◼ 報酬の予測によりタスクに適した表現を獲得 ◼ Dreamer V2 と比較して小さな物体を扱うタスクで大幅に性能改善 ◼ 感想  損失関数にタスクの情報を含ませることが重要  潜在変数がタスク依存になってしまう点が気になる 2023/10/1 0 11