( B) NGS paired-sequence file size for each of the 155 study samples. The color-coding legend defines the corresponding read types and nucleotide mismatches. Zooming in from the whole genome window (top) allows viewing of the sequences down to the nucleotide level (bottom). The Map Reads to Reference workflow output displays the reads mapped on to the linearized HPV reference genome. ( A) The HPV E6/E7 (660 bp) gene segment highlighted in blue on the circular prototypical HPV-16 genome (GenBank ID: K02718) is the target used for amplicon sequencing and genotyping. The prevailing (abundant) genotypes for LSIL or HSIL are shown in red. ( B) Aggregated heat map of LSIL ( n = 95) and HSIL ( n = 60) samples reveal dissimilar, groupwise HPV profiles useful for data simplification and taxonomy development. The color scale shows Pearson’s distance between 0 (black) and 2 (green) indicating similar and dissimilar correlation coefficients, respectively. From this point, cluster divergence toward the left part of the heat map reveals increasingly, heterogeneous HPV infections predominantly in LSIL samples. In this dataset, single (pure) infections of HPV-16 in HSIL and HPV-39 in LSIL of high abundance (rectangle) are the closest clusters. The agglomerative clustering method identifies the originating, most similar pair of clusters (gray shade). The dissimilarity measure i.e., Pearson’s distance (1- | Pearson correlation|) quantifies the dissimilarity in the variables of interest i.e., HPV type-specific abundance between individual samples. ( A) The heat map represents two-way hierarchal clustering of HPV over LSIL/HSIL samples ( n = 155) and clustering of LSIL/HSIL samples over HPV. Here, sunburst plots reveal distinct differences in the HPV communities according to seven taxonomic ranks, specifically, the last two ranks (genus/species and genotypes) between LSIL and HSIL.Ĭlustered heat map of HPV abundance among LSIL/HSIL samples. ( C) Sunburst plots visualize hierarchical data outwardly from parent to child nodes. Visualization and comparison of grouped samples (stacked bars) revealed the dominant genotypes in LSIL and HSIL as HPV-39 (46%) and HPV-16 (69%), respectively with significant changes in proportional composition (Baggerley’s test, * p-value (Bonferroni) < 0.001). ( B) HPV genotype composition of samples grouped by cytological grade i.e., LSIL and HSIL. Deep sequencing of HPV E6/E7 amplicons derived from each LSIL ( n = 95) or HSIL ( n = 60) sample identified 32 unique HPV genotypes with the top 20 shown (legend) and quantitated their composition (%) based on abundance ( n) of mapped reads to total mapped reads. HPV type-specific carcinogenicity (carcinogenic, possibly carcinogenic, and not carcinogenic) are colorized in shades of red, blue, and green, respectively (legend).
( A) Abundance of HPV genotypes found in individual LSIL and HSIL samples are shown as stacked bars. Taxonomic profiling results based on the HPV Reference Index. HPV genotyping bioinformatics cervical cancer deep sequencing human papillomavirus metagenome next generation sequencing taxonomic classification virome. The entire process named "HPV DeepSeq" provides a simple, accurate and practical means of NGS data analysis for a broad range of applications in viral research. Integrating clinically relevant, taxonomized HPV reference genomes within automated workflows proved to be an ultra-fast method of virome profiling. Biodiversity analysis between low- (LSIL) and high-grade squamous intraepithelial lesions (HSIL) revealed loss of species richness and gain of dominance by HPV-16 in HSIL. Tabular output conversion to visualizations entailed 1-2 keystrokes. Low-grade ( n = 95) and high-grade ( n = 60) Pap smears were tested with ensuing collective runtimes: Taxonomic Analysis (36 min) Alpha/Beta Diversities (5 s) Map Reads (45 min). HPV genomes from Papilloma Virus Episteme were customized and incorporated into CLC "ready-to-use" workflows for stepwise data processing to include: (1) Taxonomic Analysis, (2) Estimate Alpha/Beta Diversities, and (3) Map Reads to Reference. To address this, we developed and tested automated workflows for HPV taxonomic profiling and visualization using a customized papillomavirus database in the CLC Microbial Genomics Module. However, viral computational analysis remains a bottleneck due to semantic discrepancies between computational tools and curated reference genomes. Next-generation sequencing (NGS) has actualized the human papillomavirus (HPV) virome profiling for in-depth investigation of viral evolution and pathogenesis.