【DL輪読会】SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

481 Views

April 24, 25

#Robotics #Reinforcement Learning #Skill Transfer #Assembly Tasks #Continual Learning

スライド概要

Deep Learning JP

@DeepLearning2023

スライド一覧

DL輪読会資料

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

【DL輪読会】KAN: Kolmogorov–Arnold Networks

Deep Learning JP 87.1K

【DL輪読会】Evolutionary Optimization of Model Merging Recipes モデルマージの進化的最適化

Deep Learning JP 59.9K

【拡散モデル勉強会】拡散モデルの数理

Deep Learning JP 58K

【拡散モデル勉強会】Introduction to Diffusion Models

Deep Learning JP 41.2K

【DL輪読会】Cosmos World Foundation Model Platform for Physical AI

Deep Learning JP 37K

【DL輪読会】Conditional Flow Matching

Deep Learning JP 37K

各ページのテキスト

DEEP LEARNING JP [DL Papers] SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Presenter: Jeremy Siburian, Matsuo Lab M1 http://deeplearning.jp/ 1

http://deeplearning.jp/

Paper Information Paper Title SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Authors Yijie Guo, Bingjie Tang, Iretiayo Akinola, Dieter Fox, Abishek Gupta, Yashraj Narang Conference International Conference on Learning Representations (ICLR) 2025 (Spotlight) TLDR A novel framework called SRSA (Skill Retrieval and Adaptation) is introduced, which fine-tunes robot skills for contact-rich assembly tasks by predicting and retrieving the most transferable policy from a pre-existing skill library, enabling efficient learning and adaptation to new tasks. Policies trained with SRSA in simulation achieve a 90% mean success rate when deployed in the real world. Paper Link https://arxiv.org/abs/2503.04538 2

https://arxiv.org/abs/2503.04538

Introduction & Related Works • Humans excel at solving new tasks with few demonstrations or trial-and-error interactions. • Achieving data-efficient learning is crucial for deploying robots in diverse real-world environments • SOTA: Foundation model or generalist policy that spans multiple tasks Physical Intelligence. (2025). π0.5: a VLA with Open-World Generalization. https://www.physicalintelligence.company/download/pi05.pdf Despite these efforts, solving new tasks in contact-rich environments, such as robotic assembly, remains underexplored. 3

Introduction & Related Works Robotic Assembly Tasks AutoMate: a dataset of 100 interpenetration-free assemblies that can be simulated in robotics simulators, as well as simulation environments for all 100 assemblies Tang, B., Akinola, I., Xu, J., Wen, B., Handa, A., Van Wyk, K., ... & Narang, Y. (2024). Automate: Specialist and generalist assembly policies over diverse geometries. arXiv preprint arXiv:2407.08028. Limitation: primarily focuses on learning specialist (i.e., single-task) policies from scratch without leveraging prior experience or knowledge from related tasks 4

Introduction & Related Works Question: How can we solve novel assembly tasks by using skills from previously-solved assembly tasks? Retrieval-Based Policy Learning • Many studies have explored techniques for utilizing datasets from other tasks for pretraining, such as visual pretraining (Parisi et al., 2022; Nair et al., 2022; Xiao et al., 2022) and multi-task imitation learning (Jang et al., 2022; Ebert et al., 2021; Shridhar et al., 2022). • Previous works in robotic manipulation primarily study data retrieval for general pick-and-place manipulation tasks (Nasiriany et al., 2022; Belkhale et al., 2024; Shao et al., 2021; Zha et al., 2024) SRSA: Investigating transfer success predictor for policy retrieval Embedding Learning for Tasks and Skills • Task embedding learning: shared knowledge across tasks can significantly enhance learning efficiency for new tasks • Skill embedding learning, on the other hand, leverages unstructured prior experiences (i.e., temporally extended actions that encapsulate useful behaviors) and repurposes them to solve downstream tasks SRSA: Selecting a single relevant skill for a new task, integrate multiple embedding-learning approaches 5

Introduction & Related Works Summary of Contributions 1. Skill Retrieval Method 2. Skill Adaptation method 3. Continual Learning with SRSA SRSA tackles new assembly tasks by leveraging a skill library of diverse specialist policies 6

Problem Setup Problem Setting: solving a new target task leveraging pre-existing skills from a skill library Proposal: First retrieve a skill (i.e., a policy) for the most relevant prior task (Skill Retrieval), and then rapidly and effectively adapt to the target task by finetuning the retrieved skill (Skill Adaptation) 7

SRSA – (1) Skill Retrieval To effectively retrieve the skills from prior that are useful for a new target task T, we require a means to estimate the potential of applying a source policy to the task T. Key Intuitions 1. Applying the same policy to two tasks, similar success rate may imply similarity in dynamics and initial state distributions 2. Finetuning a source policy on a target with similar dynamics to the source task will be efficient Metric for retrieval → zero-shot transfer success Proposal: learn a function F to predict the zero-shot transfer success for any pair of source policy and target task. 8

SRSA – (1) Skill Retrieval Learning Task Features for Transfer Success Predictor Strong featurization of both source policy and target tasks for efficient learning of F Features: Part geometry, Interaction dynamics, & expert action for solving the task 9

10.

SRSA – (1) Skill Retrieval Learning Task Features for Transfer Success Predictor Proposal: jointly captures features of geometry, dynamics, and expert actions to represent the tasks. 10

11.

SRSA – (1) Skill Retrieval Learning Task Features for Transfer Success Predictor learned from point-cloud input using a PointNet autoencoder 11

12.

SRSA – (1) Skill Retrieval Learning Task Features for Transfer Success Predictor • Learning features for dynamics and expert actions is challenging, expert demonstrations are not available • Procedurally generating assembly demonstrations for new tasks is intractable (narrow-passage problem) Proposal → Generating dissembly trajectory in simulation 12

13.

SRSA – (1) Skill Retrieval Learning Task Features for Transfer Success Predictor learned from transition segments using a state-prediction & action-reconstruction objective. 13

14.

SRSA – (1) Skill Retrieval Training Transfer Success Predictor The geometry, dynamics, and expert action features are concatenated together to form task features Training → pass the concatenated task features through an MLP to predict the transfer success 14

15.

SRSA – (1) Skill Retrieval Inferring Transfer Success for Retrieval • • • • Training: Learn the transfer success predictor F with training data from a set prior tasks Testing: Evaluate F on test tasks to select source policy from prior tasks 100 tasks in AutoMate → 90 prior tasks (skill library) and 10 test tasks Baseline retrieval strategies: Signature, Behavior, Forward, Geometry Overall, SRSA retrieves source policies that obtain around 10% higher success rates on the test tasks. 15

16.

SRSA – (2) Skill Adaptation Ultimate Goal Solve the new task as an RL problem Approach 1. Initialize a policy network with the retrieved skill 2. Fine-tune policy on the target task with PPO and self-imitation learning (SIL) Experimental Setting • Dense-reward Setting • Sparse-reward Setting Experimental Results • In both sparse-reward and dense-reward settings, SRSA outperforms the AutoMate baseline 16

17.

SRSA – (2) Skill Adaptation Real-World Deployment of SRSA • Deploy the fine-tuned policy (in simulation) in the real-world • SRSA achieves average success rate of 90% on 5 real-world tasks, higher than the baseline Automate (54%) 17

18.

SRSA – (3) Continual Learning w/ Skill Library Expansion • Learning various tasks in a sequential fashion • Expanding skill library with new skills • Start from a skill library of 10 tasks → obtain 10 new policies for 10 new tasks → repeat process for over 9 iterations • Result: SRSA stays efficient as the skill library and target tasks evolve (fewer train epochs for high success rate) 18

19.

Ablation Study Effect of Skill Retrieval • SRSA compared to SRSA-Geom (geometry-based skill retrieval) • Result: SRSA improves adaptation efficiency over SRSA-Geom Effect of Self-Imitation Learning • SRSA compared to SRSA-noSIL (without self-imitation learning) • Result: SRSA outperforms the variant in terms of learning stability (variant suffered more fluctuations & larger standard deviation) Effect of a Generalist Policy • Does fine-tuning a generalist policy (SRSA-Gen) outperform a selected specialist policy? • Result: SRSA-Gen provides a weaker initialization compared to SRSA & adaptation is less efficient 19

20.

Conclusion Summary This work proposes SRSA, a pipeline for retrieving and adapting specialist policies to solve new assembly task by combining skill retrieval with policy fine-tuning and self-imitation learning. Limitations • Rotational/helical motion assemblies (e.g., nut-and-bolt) are not addressed. • Primarily focused on specialist (single-task) policies. • Real-world success rate still fall short of 99+% success rate for industry-level deployment (sim-to-real gap) Future Extensions • How to utilize existing policies for new tasks (beyond assembly tasks) • Extend SRSA for more complex tasks in other domains (e..g., tool use) 20