Written by 7:56 pm AI, Discussions

**Enhancing Genome Precision with Genomic Artificial Intelligence**

Artificial intelligence (AI) is reshaping genomics as we know it. The capability of AI to parse and…

The prevalence of whole genome sequencing (WGS) and whole exome sequencing (WES) is on the rise due to the decreasing costs of sequencing. The initial step is just the beginning. To comprehend the genome comprehensively, from basic calls to feature interpretation, analyzing sequencing data using accelerated compute, data science, and AI is essential. However, a significant challenge exists.

Individual genomes are highly complex. The National Human Genome Research Institute estimates that an individual’s 3-billion-nucleotide chromosome sequence typically contains 4 million SNVs, 600,000 insertion/deletion variants, and 25,000 architectural variations involving over 20 million nucleotides. The therapeutic implications of most of these variants remain largely unknown. Can genetic AI assist in identifying the few genetic variants with clinical significance from this vast dataset?

Artificial AI

AI techniques excel when large volumes of structured data can be combined with validated outcomes for training purposes. Recent population-level sequencing initiatives and validation datasets such as NIST Genome in a Bottle have given rise to a new category of AI known as Artificial AI. This innovation has the potential to significantly expedite the analysis, decoding, and interpretation of sequencing data, provided that the data is meticulously curated across the entire spectrum of challenges from alignment to interpretation.

If the requisite tools become more precise, user-friendly, and cost-effective, DNA sequencing holds immense promise for guiding healthcare and therapy decisions. Illumina envisions genome AI as an emerging tool that can enhance accuracy by delivering a comprehensive genome, complete with annotation and interpretation, comparable to traditional analysis methods and established biological knowledge. The company is integrating genetic AI into Illumina’s software products to achieve this goal by leveraging its vast data resources and top-tier AI expertise.

Three key areas—variant calling, identification and selection, and interpretation—will be used to showcase the value of this cutting-edge technology.

Enhancing AI-Based Variant Calling Accuracy

The advanced DRAGENTM secondary analysis pipeline enhances variant calling accuracy across a broader spectrum of the human genome, ensuring that these improvements are relevant to a vast and diverse sample population. In the 2020 Precision FDA Genetic Variant Accuracy Challenge, the hardware-accelerated DRAGEN analysis emerged victorious in the Hard to Map regions and All Benchmark-Regions categories.

Building on this success, Illumina has integrated powerful and efficient machine learning (ML) techniques that significantly boost performance.

“DRAGEN-ML seamlessly integrates with the existing Bayesian Variant Calling framework, elevating the reproducibility to unprecedented levels and resolving challenges in the most complex genome regions.” This robust and effective machine learning approach enhances sensitivity and genotyping accuracy, filtering out over 50% of false positive calls and recovering low-confidence false negatives. Rami Mehio, Head of Software and Informatics at Illumina, credits the extensive internal data and numerous collaborations for enabling the modeling of sequencing reads’ alignment to a genetic reference. “Our team has heavily relied on machine learning to continually enhance DRAGEN’s mapping sensitivity.”

The latest version, DRAGEN v4.2, leverages enhanced machine learning to achieve variant detection with an analytical accuracy of 99.84%, reducing both false positive and false negative rates. Illumina currently boasts the highest variant calling accuracy across all standard regions compared to other options using the PrecisionFDA v2 Truth Challenge standard data.

The team is further investing in machine learning algorithms for applications in RNA analysis, clinical pipelines, methylation analysis, and extensive variant calling for future DRAGEN platform releases, providing a comprehensive solution for genetic analysis.

Predicting Infectivity Across Species

Currently, only 0.1% of the millions of protein-coding variations in the human genome are annotated in scientific databases, with the majority still classified as variants of uncertain significance (VUS).

Illumina researchers developed PrimateAI-3D, a three-dimensional convolutional neural network for predicting variant effects, trained using non-human primate variants and 3D protein structures, to address this challenge. PrimateAI-3D has been validated to identify disease-causing variants more accurately across six clinical benchmarks based on real-world patient cohorts by assuming that common variants in non-human primates are unlikely to cause diseases in humans.

The PrimateAI-3D project, published in Science, was part of a significant international collaboration to sequence 809 individuals from 233 mammal species and compile a catalog of common missense variants. These WGS data were used to train PrimateAI-3D using millions of animal variants.

PrimateAI-3D also enables the estimation of the lethality of rare coding variants in over 450,000 UK Biobank individuals, enhancing rare-variant association studies and genetic risk prediction for common diseases and complex traits. By stratifying missense variants in rare-variant stress tests, PrimateAI-3D facilitated the discovery of 73% more significant gene-phenotype associations than other existing variant interpretation algorithms.

Moreover, PrimateAI-3D enables the creation of rare-variable genetic risk scores (PRS) that are more adaptable to diverse cohorts and ancestry groups, not relying on training data, which is critical for generalizing results to populations beyond those of European descent.

Illumina has made the PrimateAI-3D deep learning scores and the animal population variant database publicly available for research purposes through its software products, facilitating further exploration by the genomics community.

SpliceAI, another deep learning model released by Illumina scientists, focuses on identifying pathogenic variants in the non-coding genome, complementing PrimateAI-3D’s role in coding variations. By enhancing the identification of disease-causing variants in the non-coding DNA, scientific sequencing can extend beyond the exome to the entire genome, a significant step toward aiding patients and families.

Accelerating Variant Interpretation with AI

EmedgeneTM, a secondary analysis software, utilizes Explainable AI (XAI) to prioritize variants most likely to solve a case. Users can leverage Emedgene’s XAI algorithms while maintaining full control over the analysis process. XAI aims to be reliable, secure, transparent, and efficient.

Emedgene employs its XAI and a suite of automation features to streamline and reduce touchpoints in end-to-end germline analysis workflows for genetic disease data applications and assays, spanning genomes, exomes, targeted panels, and gene panels. This feature interpretation platform significantly reduces the time per case for large-scale screening projects related to genetic, hereditary cancer, and other genetic diseases.

By mimicking scientific workflows and providing detailed explanations of key variants along with relevant evidence, genetic XAI in Emedgene saves considerable time, with time savings ranging from 50% to 75% per case. According to Ray Louie, PhD, Associate Director at Greenwood Genetic Center, “Emedgene’s Explainable AI (XAI) simplifies the highly complex task of variant prioritization, enabling us to process more tests efficiently.”

A Baylor Genetics study demonstrated that Emedgene accurately identified reported variants as potential case solvers in a cohort of 180 samples, with these variants present in 98.4% of trio cases, 93.0% of single proband cases, and 96% of all cases. The study highlighted how Emedgene can help genetic laboratories streamline operations by effectively prioritizing candidate variants.

With decades of internal development and numerous collaborations at its disposal, Illumina is well-positioned to leverage vast datasets for training new genome AI algorithms. This wealth of data, combined with Illumina’s cutting-edge offerings and talent, accelerates the evolution of genetic AI towards a more advanced genome analysis tool.

Visited 3 times, 1 visit(s) today
Last modified: January 9, 2024
Close Search Window
Close