PointSt3R: Point Tracking through 3D Grounded Correspondence

University of Bristol1, Meta2

Abstract

Recent advances in foundational 3D reconstruction models, such as DUSt3R and MASt3R, have shown great potential in 2D and 3D correspondence in static scenes. In this paper, we propose to adapt them for the task of point tracking through 3D grounded correspondence. We first demonstrate that these models are competitive point trackers when focusing on static points, present in current point tracking benchmarks (+33.5% on EgoPoints vs. CoTracker2). We propose to combine the reconstruction loss with training for dynamic correspondence along with a visibility head, and fine-tuning MASt3R for point tracking using a relatively small amount of synthetic data. Importantly, we only train and evaluate on pairs of frames where one contains the query point, effectively removing any temporal context. Using a mix of dynamic and static point correspondences, we achieve competitive or superior point tracking results on four datasets (e.g. competitive on TAP-Vid-DAVIS 73.8 δavg / 85.8% occlusion acc. for PointSt3R compared to 75.7 / 88.3% for CoTracker2; and significantly outperform CoTracker3 on EgoPoints 61.3 vs 54.2 and RGB-S 87.0 vs 82.8). We also present results on 3D point tracking along with several ablations on training datasets and percentage of dynamic correspondences.

Performance

Tracking accuracy δavg for PointSt3R and MASt3R compared to popular point tracking methods. Performance is reported across four established benchmarks using the "first" query model. We also report on dynamic points only for three datasets.
All Points Dynamic Points
Model TAP-Vid-DAVIS RoboTAP RGB-S EgoPoints RoboTAP RGB-S EgoPoints
PIPs++ 69.1 63.0 77.8 36.9 62.7 51.3 20.0
CoTracker2 75.7 70.6 83.3 35.5 66.9 53.7 20.2
CoTracker3 76.7 78.8 82.8 54.2 74.6 65.2 35.8
MASt3R 38.5 71.6 73.2 53.5 42.5 39.4 12.3
PointSt3R 73.8 78.6 87.0 61.3 69.4 61.6 31.4

Acknowledgments

This work is supported by EPSRC Doctoral Training Program, EPSRC UMPIRE EP/T004991/1 and EPSRC Programme Grant VisualAI EP/T028572/1. We acknowledge the use of the Isambard-AI National AI Research Resource (AIRR) funded by DSIT [ST/AIRR/I-A-I/1023]. Meta only served in an advisory capacity and no experiments were conducted at/by Meta.

BibTeX

@article{guerrier2025pointst3r,
	title={{PointSt3R}: Point Tracking through 3D Ground Correspondence},
        author={Guerrier, Rhodri and Harley, Adam W. and Damen, Dima},
        journal={arXiv},
        year={2025}
  }