From the College of Natural Sciences
Font size: +

Supercomputing Helps Deepen Understanding of Life

Supercomputing Helps Deepen Understanding of Life

Making sense out of unprecedented quantities of digital information is the focus of today's Big Data in Biology Symposium at The University of Texas.

The scale at which data are generated today was, until recently, unimaginable. For instance, a decade ago, sequencing the human genome took eight years, thousands of researchers and about $1 billion. Now, we can do it in a few days at a cost approaching $1,000: Three billion base pairs—the entire genetic code each human harbours—delivered in the same time, and even the same cost, as some packages from Amazon Prime.

The ease with which we can acquire biological information is enabling some of the most exciting advances of our time, from precision cancer medicines to a deeper understanding of how the bacteria in our gut keep us healthy. But before any of that can happen, trillions upon trillions of data points must be processed into meaningful information by powerful supercomputers, like the ones available at UT Austin.

Supercomputing—or High Performance Computing—is the process of aggregating computing power to generate performance that far exceeds that of your average desktop computer.

"In the life sciences, none of the recent technological advances would have been possible without it," says Hans Hofmann, Director of the Center for Computational Biology and Bioinformatics, which is hosting today's conference. "These are calculations that can't be done on your laptop or a piece of paper. At least not in a human's lifetime."


One of the world's most powerful high-performance computers is located in Austin at UT's Texas Advanced Computing Center (TACC). Called Stampede, it's ranked 7th in the global list of supercomputers. TACC offers around 12 Petaflops of computing power and visualization server performance, which, in recent years, has powered discoveries by numerous scientists at UT Austin.

Tracing the evolution of the nervous system

High performance computers' ability to store data can enable new breakthroughs.

"When people think about supercomputers they usually think about CPU time," says Ben Liebeskind, a graduate student in the Ecology, Evolution and Behavior program. "But storage is a huge deal, especially for all that data being produced, and you need supercomputers for that. Otherwise, all that knowledge is wasted."

Liebeskind's graduate research relied predominantly on genetic information created by other people and made publically available on GenBank—a giant world repository of DNA sequence data.

Taking open-access genomics data from 40 animal species, Liebeskind used high-performance computing to fit complex evolutionary models. His aim was to shed light on the mystery of how animals evolved nervous systems. "It was always assumed that nervous systems had evolved once," he says. "They're complex, so it makes sense that it wouldn't happen lots of times." But with the dawn of the genomics revolution, evidence began to emerge, suggesting that might not be the case and that instead the nervous system may have evolved independently, multiple times.

"I found that similar nervous systems are actually the result of convergence, which suggests that it has evolved more than once," he says. "There's no way that discovery would have been possible without whole genome data and the supercomputers that can store and analyse it."

Race against Ebola

For many, the fundamental benefit of high-performance computing comes down to speed. An analysis that takes months on a laptop can be run in a day on a supercomputer, and in some situations, like developing a safe and effective vaccine for Ebola, this can make all the difference.

In early 2015, mathematical biologists at UT Austin began helping the U.S. Centers for Disease Control and Prevention (CDC) develop trials for an Ebola vaccine in Sierra Leone. Using statistical tests, they aimed to determine which trial design, out of two possible options, would give the best chance of finding a vaccine that would prevent another devastating outbreak.

"At the time, Ebola cases in West Africa were starting to decline, which was obviously very good," says Spencer Fox, a graduate student working on the project. "But in terms of a vaccine trial, this is an issue because there needs to be enough cases to really test if a vaccine is going to work. That's why we had to deliver results to the CDC before it was too late in the epidemic."

To see how effective a trial design was at detecting whether a vaccine did or did not work, the scientists fit about 500 million models, which took the equivalent of 250 days of computing time on one node on TACC. But by running these on hundreds of nodes simultaneously—known as parallelization—the analyses were done overnight.

Within two weeks, the team delivered results to the CDC and a recommendation about which method to use in conducting the vaccine trials. High-performance computing, says Fox, was key in allowing them to respond to the rapidly changing conditions and ultimately make a difference to human health. "Otherwise, the epidemic would have been over before we were finished."

A clearer picture of addiction

As our ability to sequence genomes and analyze them improves, some researchers are able to fill in missing pieces of older puzzles, like figuring out how our genes influence whether we become addicted to alcohol.

"Before supercomputers, we were limited to looking at one gene at a time, even though we knew that thousands have some importance to alcohol addiction," says Sean Farris, a postdoctoral fellow at The University of Texas at Austin's Waggoner Center for Alcohol and Addiction Research.

To understand the genetic changes that happen in an alcoholic's brain, and how those in turn influence addiction, Waggoner Center scientists sequenced brain tissue from hundreds of alcoholics and non-alcoholics. This made it possible to detect important changes in genetic code occurring anywhere in the brain: not just for the 20,000 genes in the human genome, some of which are known to be involved in addiction, but also, more importantly, for the 60,000 non-coding regions that are only now beginning to surface as playing a role.

"This is providing the most comprehensive picture to date of what is happening in the alcoholic's brain," Farris says. "We have not yet been able to find a cure for alcohol addiction. And that might be because we've never known what else was there, and changing and ultimately affecting the disease."

In the end, Farris says, that's the promise of supercomputers. "They are allowing us to delve deeper and ask bigger questions than we ever knew possible."

Memorial: Austen Riggs
Connected to @TexasScience


No comments made yet. Be the first to submit a comment
Already Registered? Login Here
Wednesday, 20 September 2017

Captcha Image