Peg-in-hole Insertion using Sensor Fusion & RL
I am working on a peg-in-hole robotic assembly thesis with a Doosan M1013, ROS2 & an eye-in-hand RGB-D camera. The upstream perception system gives a coarse hole/block pose from stationary RGB-D cameras. Based on prior measurements/error propagation, the pre-insertion uncertainty may be around 3–5 mm average and up to 7–11 mm worst case, with about 1–2° angular error.
I want to train a contact-rich insertion policy using vision + force/torque + proprioception, starting from a pre-insert pose about 5–20 mm above the hole. The task should eventually generalize across several cross-section geometries.
For people who have worked on force-guided or vision-force peg-in-hole insertion: is this initial error range realistic for an RL/contact policy to handle directly, or would you recommend adding a TCP-camera visual refinement step before starting the RL policy?
I am especially interested in practical experience with:
- ±5 mm vs ±10 mm initial xy error
- 1–2° orientation error
- force/torque-based local search after first contact
- sim-to-real transfer difficulty
- whether eye-in-hand visual refinement is worth the extra time
I am new to this field. Kindly help me out.