Improving the Reliability of Mass Spectrometry Based 'Omics Data Analysis by Developing and Applying Machine Learning Based Tools
Mass spectrometry based omics measurements enable the direct, untargeted measurement of proteins and metabolites in complex mixtures. The analysis of these data require intricate computational pipelines whose quality is critical for accurate results. Accurate metabolite identification, interpretable modeling of stable isotope incorporation, and inferring protein function are all important tasks that challenge current software tools. In this dissertation I apply machine learning techniques in the development of new tools for proteomics and lipidomics data analysis. These tools aim to improve multiple steps in this pipeline from analyte identification, quantification, to function inference. The capacity for sophisticated, holistic integration of diverse sources of evidence provided by machine learning enables substantial performance gains over preexisting algorithms. These performance gains were used to improve the accuracy of mass spectrometry based proteomics and lipidomics data analysis pipelines.