Scientists create powerful new tool to explore the microbial universe

A digital tree built from DNA-like connections represents how scientists trace the ancestry of microbes. Credit: The Biodesign Institute/Arizona State University.

Microbes are everywhere. They live in our bodies, in soil, in the oceans, and even in the air we breathe.

These microscopic organisms play a vital role in human health, agriculture, climate regulation, and the balance of ecosystems.

Yet despite powerful modern DNA sequencing technologies, scientists still struggle to identify many microbes and understand how they are related to one another.

Now, researchers at Arizona State University have developed two powerful new tools that make studying the microbial world easier, more accurate, and far more scalable.

Together, these advances strengthen the foundation of research in microbiomes, disease tracking, environmental science, and emerging areas such as precision medicine.

The work is led by Qiyun Zhu, an assistant professor in ASU’s School of Life Sciences and a researcher at the Biodesign Center for Fundamental and Applied Microbiomics.

Zhu and his colleagues describe their findings in two separate studies published in the journals Nature Communications and Nature Methods.

Understanding how microbes are related is essential for many areas of science. Accurate microbial family trees help researchers track how disease-causing organisms evolve, monitor the spread of harmful strains, and understand how microbial communities respond to environmental changes like pollution or climate warming.

They are also crucial for gut microbiome research, which links microbial balance to digestion, immunity, and overall health.

To build these family trees, scientists rely on “marker genes,” specific stretches of DNA that act like genetic signposts passed down through generations. For decades, researchers depended on a small, fixed set of traditional marker genes. But the rise of metagenomics has changed the landscape.

Instead of studying one organism at a time, scientists can now sequence all the DNA from an environment at once, revealing millions of microbial genomes.

Many of these genomes are incomplete or uneven in quality, making traditional marker genes unreliable. To address this challenge, Zhu’s team helped develop a new method called TMarSel, short for Tree-based Marker Selection. Rather than relying on a predefined gene list, TMarSel automatically searches through thousands of possible gene families and selects the combination that produces the most reliable evolutionary tree.

The system evaluates how widespread each gene is, how much useful information it contains, and how well it contributes to a stable picture of microbial relationships. This flexible, data-driven approach allows scientists to build accurate family trees even when working with massive datasets and imperfect genomes.

Alongside this advance, Zhu is also a lead developer of scikit-bio, a large, open-source software library used by researchers around the world. Scikit-bio provides the computational tools needed to analyze complex biological data, especially microbiome data.

Biological datasets are notoriously difficult to work with. They are huge, sparse, and filled with interconnected features that standard data-analysis software is not designed to handle. Scikit-bio addresses this problem by offering a robust toolkit that helps scientists compare microbial communities, measure diversity, analyze genetic sequences, build evolutionary trees, and prepare data for machine learning.

The project is openly available and community-driven, with contributions from more than 80 developers worldwide. Its reliability and careful documentation have made it one of the most widely used tools in modern biological research, cited in tens of thousands of scientific studies spanning medicine, ecology, cancer research, and climate science.

As DNA sequencing becomes faster and cheaper, scientists are uncovering an ever-growing flood of microbial data. Tools like TMarSel and scikit-bio ensure that this information can be turned into meaningful insight rather than overwhelming noise.

By combining evolutionary biology with advanced software design, Zhu and his colleagues are helping scientists around the world better understand the invisible organisms that shape life on Earth.