BACKGROUND: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases . These mutations enable the SARS-CoV-2 to evolve into new strains . Viral quasispecies emerge from de novo mutations that occur in individual patients . In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing .
METHODS: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences . We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment . Using this method, we annotated viral mutation signatures that were associated with specific strains . Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients . These results were compared to the pangenome genetic fingerprints .
RESULTS: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes . We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and a combination of mutations that appear in only a small number of patients . We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations . Using a cohort of 100 SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections . We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome .
CONCLUSIONS: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints . Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples . We identified quasispecies mutations occurring within individual patients and determined their general prevalence when compared to over 70,000 other strains . Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.