u/Duke__390

I am working on a peg-in-hole robotic assembly thesis with a Doosan M1013, ROS2 & an eye-in-hand RGB-D camera. The upstream perception system gives a coarse hole/block pose from stationary RGB-D cameras. Based on prior measurements/error propagation, the pre-insertion uncertainty may be around 3–5 mm average and up to 7–11 mm worst case, with about 1–2° angular error.

I want to train a contact-rich insertion policy using vision + force/torque + proprioception, starting from a pre-insert pose about 5–20 mm above the hole. The task should eventually generalize across several cross-section geometries.

For people who have worked on force-guided or vision-force peg-in-hole insertion: is this initial error range realistic for an RL/contact policy to handle directly, or would you recommend adding a TCP-camera visual refinement step before starting the RL policy?

I am especially interested in practical experience with:

±5 mm vs ±10 mm initial xy error
1–2° orientation error
force/torque-based local search after first contact
sim-to-real transfer difficulty
whether eye-in-hand visual refinement is worth the extra time

I am new to this field. Kindly help me out.

Peg-in-hole Insertion using Sensor Fusion &amp; RL

Peg-in-hole Insertion using Sensor Fusion & RL