Genotyping and Allele Calling of Complex Regions of the Human Genome

Principal Investigator: Dr. Philip Awadalla

Affiliation: Ontario Institute for Cancer Research

Start Year: 2021

While the majority of the human genome is identical among individuals, there are some regions that differ. These variable regions often have important biological impacts, as specific genetic variants have different abilities to perform biological functions. To identify which genetic variants an individual has, their DNA is compared to a gold-standard reference genome, and differences between the individual and the reference are identified. This is done with short-read sequencing data, where the DNA has been chopped into millions of small pieces that are pieced together using the reference as a guide. However, for regions that contain many variants, the DNA is too different to be compared to the reference and specific genetic variants cannot be identified. This is true for individuals of African and Asian ancestry, who have more differences from the gold-standard reference genome, and who are understudied in genomic research compared to individuals of European ancestry. We are developing a computational approach that improves the ability to identify these genetic variants and will be testing and validating it on selected individuals from CanPath with diverse ethnicities. To evaluate the accuracy of our approach, we will compare our predictions with the variants identified from modern long-read sequencing technology that uses longer pieces of DNA to more accurately compare to the reference genome. We hypothesize that our approach will be able to accurately identify genetic variants in complex regions using short-read sequencing data, allowing for elusive variants to be identified from the thousands of individuals with short-read sequencing data available.