1.1K Views
April 07, 23
スライド概要
2023/4/7
Deep Learning JP
http://deeplearning.jp/seminar-2/
DL輪読会資料
Segment Anything Shohei Taniguchi, Matsuo Lab
Segment Anything ॻࢽใ ஶऀ Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick ֓ཁ • Meta͕ެ։ͨ͠ηάϝϯςʔγϣϯͷͨΊͷج൫ϞσϧSAM • 1100ສຕͷը૾ʹ10ԯҎ্ͷϚεΫ͕Ξϊςʔγϣϯ͞Εͨσʔληοτ SA-1Bެ։ 2
֓ཁ Segment-Anything Model, SAM • ༷ʑͳϓϩϯϓτ͔ΒମͷϚεΫΛੜͰ͖ΔϞσϧ ࢦࣔɾςΩετɾྖҬͳͲ
֓ཁ Segment-Anything Model, SAM • Τοδ༧ଌtext-to-maskzero-shotͰ݁ߏͰ͖Δ
ൃද֓ཁ • λεΫɿPromotable segmentation • ϞσϧɿSegment Anything Model • σʔλɿData engine • ࣮ݧ • ·ͱΊ 5
എܠ • ۙɼେنޠݴϞσϧͷൃల͕͍͢͝ ‣ PromptΛ༩͑ͨΒࣗࡏʹޠݴΛੜͰ͖Δ ‣ Scaling lawͰͲΜͲΜੑೳ্͕͕Δ ➡ίϯϐϡʔλϏδϣϯͰಉ͡Α͏ͳ͜ͱ Ͱ͖ͳ͍ͷ͔ʁ https://j.gifs.com/Y7mBPW.gif 6
λεΫ Promptable Segmentation • ैདྷͷηάϝϯςʔγϣϯλεΫͱҧ͍ ηάϝϯτରΛϓϩϯϓτͰࢦఆ͢Δ ‣ ࢦࣔɼྖҬɼςΩετͳͲ • ϓϩϯϓτᐆດੑΛؚΉͨΊ ਖ਼͍͠ϚεΫ1ͭͱݶΒͳ͍ 7
Ϟσϧ Segment Anything Model, SAM • ߏ݁ߏγϯϓϧ 1. ը૾ͱϓϩϯϓτΛ ͦΕͧΕຒΊࠐΉ 2. TransformerϕʔεͷσίʔμͰ ຒΊࠐΈ͔ΒϚεΫΛੜ͢Δ 8
Ϟσϧ Segment Anything Model, SAM • Image encoder ‣ ը૾ΛಛྔʹຒΊࠐΉ ‣ தViT ‣ 1൪͕ࢉܭॏ͍෦͕ͩɼ ਪ࣌ʹಛྔΛอ͓͚࣋ͯ͠ ϓϩϯϓτΛϦΞϧλΠϜͰ͍͡ΕΔ 9
Ϟσϧ Segment Anything Model, SAM • Prompt encoder (points, box) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ positional encodingʹͯ͠ ֶशՄೳͳຒΊࠐΈύϥϝʔλͱ ͠߹ΘͤΔ 10
Ϟσϧ Segment Anything Model, SAM • Prompt encoder (text) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ CLIPͷtext encoderΛ͏ 11
Ϟσϧ Segment Anything Model, SAM • Prompt encoder (mask) ‣ ϓϩϯϓτΛຒΊࠐΉ ‣ ΈࠐΈΛ͔͚ͨͷΛ ը૾ຒΊࠐΈͱ͠߹ΘͤΔ 12
Ϟσϧ Segment Anything Model, SAM • Mask decoder ‣ ϚεΫީิΛग़ྗ͢Δ ‣ தTransformerͷdecoder ‣ ϓϩϯϓτͷᐆດੑʹରॲ͢ΔͨΊʹ 3ͭͷީิΛग़ྗ͢Δ 13
Ϟσϧ Segment Anything Model, SAM • ֶश ‣ Focal lossͱdice lossΛ Έ߹Θֶͤͯश ‣ ϓϩϯϓτϥϯμϜʹ αϯϓϧ͢Δ 14
σʔλ Data Engine • SAMΛΞϊςʔγϣϯʹ͢༻׆Δ ‣ Model-in-the-loop • 3ஈ֊ʹ͚ͯΞϊςʔγϣϯ͢Δ 15
σʔλ Data Engine 1. SAM͕༧ଌͨ͠ϚεΫΛमਖ਼͢Δ • SAMॳΊʹผͷσʔληοτͰ ࣄલʹֶश͓ͤͯ͘͞ • σʔλ͕͋Δఔू·ͬͨΒ ͦΕΛͬͯSAMΛֶशͤ͞Δ • 1ը૾͋ͨΓ30ඵҎʹ༩Ͱ͖ΔൣғͰ Ξϊςʔγϣϯ 16
σʔλ Data Engine 2. SAM͕༧ଌͨ͠ͷҎ֎ΛΞϊςʔγϣϯ • ΑΓࡉ͔͍෦ΛΞϊςʔγϣϯ • ͜ͷࡍʹ৽͘͠Ճͨ͠σʔλͰ SAMΛֶशͤ͞Δ • ͜͜·ͰͰ1020ສݸͷϚεΫ͕ಘΒΕΔ 17
σʔλ Data Engine 3. SAMͷ༧ଌͰΞϊςʔγϣϯ • 2ஈ֊ͰSAM͕͔ͳΓ͍͍ਫ਼ʹ ͳ͍ͬͯΔͨΊɼ༧ଌ݁ՌΛ΄ͱΜͲ ͦͷ··Ξϊςʔγϣϯͱͯ͑͠Δ • Ϟσϧͷ֬৴͕ߴ͍ͷΛબΜͰ NMSͰॏෳΛআ͢ڈΔ 18
σʔλ SA-1B • ࠷ऴతʹ1100ສຕͷը૾ʹ11ԯݸͷϚεΫ͕ ͍ͭͨσʔληοτ͕Ͱ͖Δ • طଘͷσʔληοτʹൺͯɼ1ը૾͋ͨΓͷ ϚεΫͷ͕͍ͩͿଟ͍ 19
σʔλ SA-1B • ϚεΫͷҐஔͷόΠΞεগͳ͍ • طଘͷͷத৺ۙʹ͔ͳΓภ͍ͬͯΔ 20
࣮ݧ ࢦ͔ࣔΒͷϚεΫ༧ଌ • ଟ͘ͷϕϯνϚʔΫͰZero-shotͰطଘͷϞσϧΛ্ճΔੑೳ͕ग़Δ • Zero-shotɿ֤σʔληοτͰfinetune͍ͯ͠ͳ͍ 21
࣮ݧ ͦͷଞͷzero-shotੑೳ Text-to-mask Τοδ༧ଌ 22
࣮ݧ Ablation study • σʔλྔϞσϧαΠζʹΑͬͯੑೳ͕Ͳͷ͘Β͍มΘΔ͔ͷੳ • σʔλྔʹؔͯ͠100ສຕ͘Β͍Ͱ݁ߏανͬͯͦ͏ͳҹ
·ͱΊ • ϓϩϯϓτͰ੍ޚՄೳͳηάϝϯςʔγϣϯ༻ج൫ϞσϧSAMΛఏҊ • SAMΛͬͯmodel-in-the-loopͰσʔλΛऩूͨ͠SA-1Bσʔληοτެ։ • σϞެ։͞Ε͍ͯΔ https://segment-anything.com/demo • ϓϩϯϓτը૾Ͱܥ൚༻తʹ͑ΔΞϓϩʔνʹͳΓͦ͏