The research paper presents PhoneticXEUS, a phone recognition model trained on extensive multilingual data, achieving state-of-the-art performance with 17.7% PFER on multilingual tasks and 10.6% PFER on accented English speech. The study identifies key factors that affect performance in multilingual phone recognition, including data scale, architecture, and training objectives. By conducting controlled ablations across 100+ languages, the paper quantifies the effects of SSL representations and analyzes error patterns related to language families and articulatory features. The authors have made all data and code openly accessible for further research.
