A Communication- and Memory-Aware Model for Load Balancing Tasks
Authors:
Jonathan Lifflander,
Philippe P. Pebay,
Nicole L. Slattengren,
Pierre L. Pebay,
Robert A. Pfeiffer,
Joseph D. Kotulski,
Sean T. McGovern
Abstract:
While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense…
▽ More
While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense of data replication, which increases memory usage. We propose a fully distributed, heuristic-based load balancing optimization algorithm, and demonstrate that it quickly finds close-to-optimal solutions. We formalize the complex optimization problem as a mixed-integer linear program, and compare it to our strategy. Finally, we show that when applied to an electromagnetics code, our approach obtains up to 2.3x speedups for the imbalanced execution.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.