 ## News

From the College of Natural Sciences
Font size: +

# Randomized Sampling Could Help Solve Billions of Equations Simultaneously

Credit: Oden Institute for Computational Engineering and Sciences

Algebra. Mention the word in public and anyone in earshot is likely to run screaming as far from you as possible. Society's mental block when it comes to mathematics is frequently based on a misconception that the kinds of mathematical principles we learn at school - such as algebra – are of little use to us in the real world.

Nothing could be further from the truth. Many modern technologies we all rely upon - smartphones, laptops etc - are dependent on their capacity to solve multiple algebraic systems of equations at once.

Researchers from the University of Texas at Austin have secured funding from the National Science Foundation to advance current computational methods for solving what are known as "linear systems" of equations. Per-Gunnar Martinsson and Rachel Ward from the Oden Institute for Computational Engineering and Sciences and the Department of Mathematics at the University of Texas at Austin, will collaborate with researchers from Yale and Caltech to develop faster, more energy-efficient algorithms for solving large numbers of linear algebraic equations.

What is Randomized Numerical Linear Algebra?

We all learn techniques for solving simple linear algebraic "systems" in school. Think of a small system involving two equations for two variables x and y such as: 2x + 3y = 5 and 3x - 5y = -2.

Now think of a very large system involving "N" equations that define "N" unknown variables. If N is on the order of a few thousands, a computer can solve the system very quickly without any need for "fast" algorithms. But basic techniques do not work well for larger problems.

"The cost scales cubically with the problem size so that every time you double N, the cost goes up by a factor of 8," explains Martinsson. "If you can solve a system with 1000 variables in 1 second, then 8000 variables would require, 10 minutes (=8^3 seconds), 64000 variables would require 73h (=64^3 seconds) and so on. This is problematic, since people in applications want to solve problems with a million, or a billion, or a trillion, variables. So we are trying to develop faster algorithms for solving large linear systems," said Martinsson.

There already exist very efficient algorithms that allow us to solve huge systems using computers. Research on "fast solvers" is a well established part of scientific computing. At the Oden Institute, for example, George Biros' research group has amassed a considerable body of work in this area already.

"But most of the existing methods are designed for a specific application; for instance the linear systems that arise when you simulate physical phenomena such as electromagnetic scattering, deformations of solid bodies, semiconductor design, and so on," Martinsson said.

Randomized Sampling

The UT researchers are building upon the idea of randomized sampling – a probabilistic method supported by mathematical safety nets to ensure sample sets are accurate representations of the overall system.

Randomized sampling can be applied in a variety of ways. For example, one could jump between the equations randomly and solve them one at a time.

Random averaging is another approach. Instead of picking just a single equation to solve, you take a slight sample from each one to form a much smaller system.

In both cases, the smaller systems can be validated as representative of the larger group of equations by using mathematical tools.

"Randomized sampling is in some ways analogous to techniques used for testing COVID 19," said Ward. "When faced with 10,000 saliva samples, testing each one individually is expensive and time-consuming, especially when it is expected that only, say, one or two samples will test positive. Instead, 10,000 tests can be reduced by mixing the saliva of 100 samples and testing the saliva mixtures. This first round of testing will highlight one or two groups of 100 samples, and then individual testing can be done within these groups."

There are several other ways to apply randomized sampling to solve complex systems of equations. Doing so effectively when a system is in the order of millions or billions is still a challenge. Systems of this magnitude frequently show up in science and engineering and the speed at which a computer can solve them impacts everything, from the resolution attainable in medical imaging, to the cost of powering a large-scale data center.

"Our objective is very simple: to build faster algorithms for solving large linear systems," Martinsson said. "We want to both develop the mathematics to analyze the methods, and to build software that can in practice handle much larger problems than any existing methods."

The goal may be simple but its execution is anything but. So experts from Yale and Caltech are also playing key roles in this UT-led research group.

"Vladimir Rokhlin from Yale has developed some of the most ubiquitous fast solvers used in scientific computing today," Ward said. "Joel Tropp from Caltech is a graduate of the Oden institute's Computational Science, Engineering, and Mathematics (CSEM) program and is a thought leader in the "mathematics of large arrays of random numbers, and on analyzing the behavior of random algorithms."

This NSF awarded a \$1.4 million grant for the research. The project is expected to last three years.

By John Holden

This article has been cross-posted from the Oden Institute for Computational Engineering and Sciences. Guest
Wednesday, 18 May 2022 Read our publication, The Texas Scientist, a digest covering the people and groundbreaking discoveries that make the College of Natural Sciences one of the most amazing and significant places on Earth.