A Correct-and-Certify Approach to Self-Supervise Object Pose Estimators via Ensemble Self-Training

Jingnan Shi

Massachusetts Institute of Technology

Rajat Talak

Massachusetts Institute of Technology

Dominic Maggio

Massachusetts Institute of Technology

Luca Carlone

Massachusetts Institute of Technology

Paper ID 76

Session 10. Robot Perception

Poster Session Thursday, July 13

Poster 12

Abstract: Real-world robotics applications demand object pose estimation methods that work reliably across a variety of scenarios. Modern learning-based approaches require large labeled datasets and tend to perform poorly outside the training domain. Our first contribution is to develop a robust corrector module that corrects pose estimates using depth information, thus enabling existing methods to better generalize to new test domains; the corrector operates on semantic keypoints (but is also applicable to other pose estimators) and is fully differentiable. Our second contribution is an ensemble self-training approach that simultaneously trains multiple pose estimators in a self-supervised manner. Our ensemble self-training architecture uses the robust corrector to refine the output of each pose estimator; then, it evaluates the quality of the outputs using observable correctness certificates; finally, it uses the observably correct outputs for further training, without requiring external supervision. As an additional contribution, we propose small improvements to a regression-based keypoint detection architecture, to enhance its robustness to outliers; these improvements include a robust pooling scheme and a robust centroid computation. Experiments on the YCBV and TLESS datasets show the proposed ensemble self-training performs on par or better than fully supervised baselines while not requiring 3D annotations on real data.