Doctoral Dissertations

Date of Award

12-2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Audris Mockus

Committee Members

Scott Ruoti, Jian Huang, Jeffrey Case

Abstract

Background: A key benefit of open source software is the ability to copy code to reuse in other projects. Code reuse provides benefits such as faster development time, lower cost, and improved quality. There are several ways to reuse open source software in new projects including copy-based reuse, library reuse, and the use of package managers. This work specifically looks at copy-based code reuse.

Motivation: Code reuse has many benefits, but also has inherent risks, including security and legal risks. The reused code may contain security vulnerabilities, license violations, or other issues. Security vulnerabilities may persist in projects that copy vulnerable code, even if fixed in the project from where the code was appropriated. License terms may not be propagated with the copied code, potentially causing license violations unknown to users of the project. The extent of the spread of risks through copy-based code reuse, the potential impact of such spread, or avenues for mitigating those risks have not been studied in the context of a nearly complete collection of open source code. %security, quality and compliance.

Aim: We aim to find ways to detect security, legal, and other risks induced by copy-based code reuse, determine how prevalent they are, and explore how they may be addressed in order to help developers safely and effectively reuse code from other projects.

Method: We rely on World of Code infrastructure that provides a curated and cross-referenced collection of nearly all open source software to conduct a case study of a few known vulnerabilities, conduct an empirical study of a large number of known vulnerabilities, and to produce a tool to help mitigate security, legal, and other risks.

Results: We find numerous instances of security vulnerabilities and license violations caused by copy-based code reuse in currently active and in highly popular projects. The often long delay in fixing orphan vulnerabilities even in highly popular projects increases the chances of it spreading to new projects. We provided patches to a number of project maintainers and found that only a small percentage accepted and applied the patch. We present an approach to produce a universal version history which links files across multiple repositories and multiple repository hosting platforms to construct a single history by tracing the version of a single file across all repositories and revision histories where either parents or descendants of that file reside. We then show how this approach can reduce the risks of copy-based code reuse.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS