Doctoral Dissertations

Orcid ID

0000-0002-9141-9039

Date of Award

12-2021

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael R. Jantz

Committee Members

Michael R. Jantz, James S. Plank, George Bosilca, Kshitij Doshi

Abstract

Market forces and technological constraints have led to a gap between CPU and memory performance that has widened for decades. While processor scaling has plateaued in recent years, this gap persists and is not expected to diminish for the foreseeable future. This discrepancy presents a host of challenges for scaling application performance, which have only been exacerbated in recent years, as increasing demands for fast and effective data analytics are driving memory energy, bandwidth, and capacity requirements to new heights.

To address these trends, hardware architects have introduced a plethora of memory technologies. For example, most modern memory systems include enhanced power-saving features which transition DRAM devices to low-power states during periods of relative inactivity. While these features can significantly alleviate energy concerns, most modern systems do not attempt to take advantage of them; often, default configurations are typically only effective while the entire memory system is mostly idle.

To continue scaling memory bandwidth and capacity, many systems have begun to incorporate alternative memory technologies that have only recently emerged. For example, so-called "on-package" or "die-stacked" RAMs sustain much higher bandwidth than conventional memory, while non-volatile RAM technologies, such as phase change memory, support higher capacities with less cost per bit. However, these novel technologies, each with a unique set of advantages and drawbacks, are often included alongside DDR SDRAM to form a multi-tier architecture. This introduces a new problem: without the traditional view of memory as a single, homogeneous block, how does software determine which tier of memory to use for each allocation? This set of decisions carries consequences for all software on the system, as suboptimal choices can hamper performance and unnecessarily increase power.

Data management strategies need to be altered to make the most of these low-power DRAM facilities and emerging memory technologies. One strategy is to modify the application directly, by calling a library and specifying each allocation's intent, or to detect and utilize different memory devices in the application itself. However, such a process can be time-consuming and difficult for the programmer, and may require subsequent modifications when moving between platforms. Some multi-layer memory systems employ high-performance memories as a cache for data in lower memory tiers. For instance, Knights Landing processors can use the 16GB of high-bandwidth MCDRAM as a cache for DRAM while Cascade Lake systems can utilize DRAM as a cache for non-volatile Optane DC memory. While this approach is transparent to the application, difficulties arise with certain pathological access patterns, which can cause poor performance, thrashing, or reduced energy efficiency.

Alternatively, the responsibility of data placement could rest on the operating system alone. But without knowledge of allocations' bandwidth and latency requirements, the operating system would be forced to rely on generalized heuristics to make critical tier selections. Our research seeks to fix this: by using a cross-layer approach, the application layer gathers and communicates its usage of memory, while allowing the kernel to handle allocation and migration. We then use this collaboration to decrease power consumption in homogeneous memory systems. or boost heterogeneous system performance.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS