( E) Number of SNPs and indels shared between different combinations of the pan-human, superpopulation, and population consensus ( D) Number of major alleles for each population consensus genome that were replaced in the reference. ( B) Visual representation of the individuals used to construct consensus genomes of varying population specificity. ( A) Construction of a consensus genome: The minor allele in the reference is replaced by the most common (major) allele in the Additionally, replacing the current reference genome with a consensus genome in existing analysis pipelines is straightforwardīecause the consensus genome is still a linear haploid sequence.Ĭonstruction of the consensus genome with major allele replacements. 2019), and the construction of population-specific consensus genomes has been a major goal of multiple projects ( Cho et al. Prior work has shown that using a consensus genome can have positive effects on Used to define the major and minor alleles. Because allele frequencies must be defined with respect to a population, a consensus genome is representative of the population A consensus genome is a linear haploid genome that incorporates population variation information by replacing all minorĪlleles in the reference genome with the major allele of that variant ( Fig. Work, we proposed using a consensus genome to inherently capture common variation while still retaining the structure andįunctionality of the current reference assembly ( Ballouz et al. 2021), it remains challenging to use these tools for large-scale expression analyses such as in RNA-seq quantification. However, despite the ubiquity of RNA-seq alignment and quantification, the improvements in mapping from using a more diverseĪlthough graph genomes are theoretically capable of encapsulating all observed variation information ( Church et al. Studies and GWAS analyses ( Chen and Butte 2010 Rosenfeld et al. This lack of variation information negatively affects all kinds of genomic analyses that use the reference, such as disease Million bases of DNA not seen in GRCh38 ( Sherman et al. One particularly glaring example was shown in a recent construction of an African pan-genome, which contained almost 300 2009) however, structural variation present in the human population has challenged this ( Berlin et al. Overall, it was estimated that approximatelyģ000 genomes would be necessary to capture the most common variants ( Ionita-Laza et al. The 1000 Genomes Project Consortium, sequenced 2504 individuals across 26 populations. To explore and capture human diversity, researchers have continued sequencing thousands of genomes. The vast diversity present in the human population ( Chen and Butte 2010 Rosenfeld et al. Because such a large portion of the reference comes from such a small pool of individuals, it does not adequately represent Around 93% of the current GRCh38 assembly is composed of DNA from just 11 individuals ( International Human Genome Sequencing Consortium 2001 ). Despite the utility and continuous improvements over the years, it is still not without flaws, primarily the lack of variation Version of the Human Reference Genome ( International Human Genome Sequencing Consortium 2004 ). In 2003, 15 years of work culminated with the International Human Genome Sequencing Consortium publishing the first finished Replacing the reference with consensus genomes impacts functional analyses, such as differentialĮxpressions of isoforms, genes, and splice junctions. Resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a We also found that using more population-specific consensuses The reference was replaced with the pan-human consensus genome. Overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when We compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. Using personal haploid genomes as the ground truth, Levels, using variant information from The 1000 Genomes Project Consortium. Toįind the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population In this study, we explored the consensus genome asĪ potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. However, in its present form, it does notĪdequately represent the vast genetic diversity of the human population. The Human Reference Genome serves as the foundation for modern genomic analyses.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |