Back
Science

Researchers Develop DEGU AI Tool for Enhanced Genomic Prediction Accuracy

View source

Artificial intelligence (AI) tools, specifically deep neural networks (DNNs), are increasingly used to predict genomic experiment results. A significant challenge with these advanced tools, however, is accurately assessing the certainty of their predictions.

To address this crucial issue, Cold Spring Harbor Laboratory (CSHL) Associate Professor Peter Koo, former CSHL postdoc Jessica Zhou, and graduate student Kaeli Rizzo developed DEGU (Distilling Ensembles for Genomic Uncertainty-aware models). DEGU aims to improve the efficiency and accuracy of DNN predictions.

The Challenge with Traditional Ensemble Methods

Traditionally, researchers might employ multiple models—often around 10—and deep ensemble learning to evaluate agreement and disagreement among predictions. While effective, this approach becomes progressively challenging and computationally intensive as model sizes increase, making it difficult to manage and scale.

DEGU's Innovative Approach

DEGU is founded on "deep ensemble distribution distillation," a method that learns a DNN's overall prediction distribution. This innovative technique effectively condenses multiple ensemble models.

DEGU distills multiple ensemble models into a single, more manageable tool.

Models trained using DEGU have consistently demonstrated improved predictions and better explanations for those predictions, all while requiring significantly less computational power.

Enhanced Efficiency and Interpretability

For instance, rather than analyzing 10 separate models, DEGU empowers researchers to work with one model that is merely one-tenth the size but retains similar predictive capabilities. This remarkable simplification not only boosts efficiency but also makes it considerably easier to understand the underlying factors driving predictions and their associated uncertainties.

Future Directions and Impact

The Koo lab is actively working to further enhance DEGU's efficiency and expand its accessibility to a broader research community. The overarching goal is to improve model reliability for downstream applications, potentially reducing the time and cost associated with laboratory experiments by minimizing the pursuit of uncertain predictions.