Now that researchers have a new tool at their disposal, they are delving into the intricate protein regulatory mechanisms involved in both beneficial and disordered physiological processes. A deep learning software developed by researchers at the University of California, San Diego (UCSD) and other institutions is poised to revolutionize genomics projects. The software, named “EUGENE” (Genomic Elements with Neurological Nets), is highlighted in a report titled “Predictive Evaluation of Regulatory Patterns with EUGENe” published in Nature Computational Science.
According to the report, “Henry,” a component of EUGENE, comprises various modules and subpackages for data extraction, transformation, model instantiation, training, and model behavior evaluation post-training. The primary aim of EUGENE, as emphasized by the researchers, is to streamline the end-to-end implementation of deep-learning solutions in regulatory sequencing.
While a strong understanding of genomics is not a recent advancement, the application of technology has significantly contributed to identifying DNA and RNA protein binding motifs, predicting chromatin states, and assessing regulatory activity. However, the complexity of designing and implementing deep learning-based processes for bioinformatics studies remains a challenge, attributed to the nuances specific to bioinformatics data. The authors also noted that the lack of standardization in script implementations hampers flexibility and reproducibility.
Adam Klie, a PhD student at the UCSD School of Medicine and the software’s creator, developed EUGENE to address these challenges based on his own experiences. He highlighted the time-consuming nature of current tools that require extensive programming and data manipulation, contrasting it with the user-friendly approach of EUGENE. Researchers can leverage the program to explore various characteristics of datasets and simulate outcomes following modifications.
In a validation exercise, the researchers tested EUGENE by replicating findings from three regulatory sequencing studies using diverse data types such as RNA-bound protein sensitivity data, ChIP-sequencing data from the ENCODE project, and plant promoter assays. EUGENE successfully reproduced the results across different data formats, eliminating the need to integrate multiple analysis platforms.
Hannah Carter, PhD, an associate professor at the UCSD School of Medicine and a co-author, emphasized the importance of reproducible analyses in scientific research, particularly in studies involving complex data. She praised EUGENE for its adaptability to various DNA sequencing data types and deep learning models, foreseeing its potential to accelerate genomics research and foster collaborative tools within the research community.
While EUGENE currently focuses on DNA and RNA data, the researchers acknowledge the need to expand its capabilities to handle protein formats and bidirectional sources. Future enhancements will incorporate additional file types like single-cell sequencing data, broadening its utility in genomics research and extending its accessibility to the medical community.