2025
Genetic Similarity Clustering Using the UK Biobank as a Reference Dataset
When testing their genetic ancestry clustering algorithm using CARTaGENE data, Ngoc-Quynh Le and colleagues demonstrated that it could consistently categorize participants across 19 worldwide ancestry categories. With 81% precision and 97% recall, 519 people were correctly clustered as Middle Eastern in CARTaGENE. This illustrates how ancestry in smaller cohorts, such as CARTaGENE, may be successfully identified by using the UK Biobank as a reference.