A Coarse-to-Fine Multi-modal Detection Framework Based on Deep Learning for Robotic Coating Tasks

Abstract

Automated robotic systems are evolving rapidly in industrial manufacturing. In robotic seal coating scenarios, while RGB-D based perception is adaptable for flexible demands, it struggles with achieving high accuracy.

To address this limitation, we propose a multi-modal visual detection system for robotic seam gluing, which employs a coarse-to-fine strategy across 2D and 3D data modalities. The proposed framework integrates two main components: a 2D Seam Segmentation & Classification Network (2D-SSCN) for coarse seam identification at the image level, and a 3D Seam Refinement & Transformation Network (3D-SRTN) for precise seam refinement at the point cloud level. Specifically, 3D-SRTN performs local seam point optimization, followed with global transformation to minimize cumulative coordinate errors. A post-processing procedure is then designed to generate smooth 6-DoF paths.

Additionally, we introduce the RGB-D Coating Dataset, specifically tailored for this task, which includes both 2D and 3D annotated data for RGB-D coating perception. Ablation studies validate the effectiveness of individual modules, while comparative experiments demonstrate the overall superiority of our system.

Video

Method Overview

Architecture of Our Framework

SFRM

APTM

Results

The results of 2D-SSCN

The results of 3D-SRTN

Real-world Experiments