Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Tickle College of Engineering
  4. Engineering -- Faculty Publications and Other Works
  5. Min H. Kao Department of Electrical Engineering and Computer Science
  6. Electrical Engineering and Computer Science Publications and Other Works
  7. Leveraging Risk Models to Improve Productivity for Effective Code Un-Freeze at Scale
Details

Leveraging Risk Models to Improve Productivity for Effective Code Un-Freeze at Scale

Date Issued
January 1, 2025
Author(s)
Mockus, Audris  
Abreu, Rui
DOI
https://doi.org/10.1145/3722216
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/13209
Abstract

Changing software is essential to add needed functionality and to fix problems, but changes may introduce defects that lead to outages. This motivates one of the oldest software quality control techniques: a temporary prevention of non-critical changes to the codebase — code freeze. Despite its widespread use in practice, research literature is scant. Historically, code freezes were used as a way to improve software quality by preventing changes during periods before software releases, but code freezes significantly slow down development. To address this shortcoming we develop and evaluate a family of code un-freeze (permitting changes) strategies tailored to different occasions and products at Meta. They are designed to un-freeze the maximum amount of code without compromising quality. The three primary dimensions to un-freeze involve a) the exact timing of (and the reasoning behind it) the code freezes, b) the parts of the organization or the codebase where the codebase freeze is applied to, and c) the method of screening of the code diffs during the code freeze with the aim to allow low risk diffs and prevent only the most risky diffs.


To operationalize the drivers of outages, we consider the entire network of interdependencies among different parts of the source code, the engineers that modify the code, code complexity, and the coordination dependencies and authors’ expertise. Since the code freeze is a balancing act between reducing outages and allowing software development to proceed unimpeded, the performance of the various approaches to code un-freeze is evaluated based on the fraction of flagged/gated changes to measure overhead and the fraction of all outage-causing changes contained within the set of flagged set of changes to measure the ability of the code un-freeze to delay (or prevent) outages. We found that taking into account the risk posed by modifying individual files and the properties of the change we could un-freeze two and 2.5 times more changes correspondingly.

The change level model is used by Meta in production. For example, during the winter 2023 code freeze, we see that only 16% of changes are gated. Although 42% more changes landed (were integrated into the codebase) compared to the prior year, there was a 52% decrease in outages. This reduction meant less impact on users and less strain on engineers during the holiday period. The risk model has been enormously effective at allowing low risk changes to proceed while gating high risk changes and reducing outages.

Subjects

System outages

code freeze

defect prediction

Disciplines
Computer Sciences
Embargo Date
May 6, 2025
File(s)
Thumbnail Image
Name

3722216.pdf

Size

1.7 MB

Format

Adobe PDF

Checksum (MD5)

26c965a4a2c7df3fa06dd1aaf8cbb9ef

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify