Doctoral Dissertations

Date of Award

5-2025

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Business Analytics

Major Professor

Yuanyang Liu

Committee Members

Chuanren Liu, Tingliang Huang, Xiaojia Guo

Abstract

This dissertation investigates job market dynamics through machine learning and natural language processing applied to job posting data. The research aims to enhance labor market transparency by providing accurate salary predictions, insights into skills’ monetary value, and identification of high-demand skill combinations. Analysis utilizes a comprehensive dataset of Data Scientist job postings across the USA, focusing on the technical labor market to deliver actionable insights.

The first essay develops a robust salary prediction model leveraging both unstructured and structured job posting data. Textual information from job descriptions is transformed using various embedding techniques (Word2Vec, Doc2Vec, BERT, and OpenAI embeddings), while structured variables are extracted directly from job attributes. These features are processed through H2O Automated Machine Learning (AutoML), which evaluates multiple model families—including linear models, tree-based algorithms, and multilayer perceptrons— to create an optimized ensemble model. Our best model achieves a Mean Absolute Percentage Error (MAPE) of 16%, demonstrating strong predictive performance.

The second essay introduces a framework for estimating the monetary value of individual skills and skill combinations in Data Science. Using a quasi-experimental design, job postings are segmented into treatment and control groups based on the presence of specific skill terms, isolating each skill’s marginal impact on salary outcomes. The concept of skill complementarity captures synergistic effects where certain skill pairs yield higher salary premiums together than the sum of their individual contributions. These findings benefit multiple stakeholders: job seekers can target high-value skills, while employers can make informed decisions about recruitment requirements.

This dissertation advances labor market transparency by developing text-driven machine learning models for salary prediction and skill valuation. By integrating advanced analytics with large-scale labor market data, the research contributes to labor market analytics with data-driven insights for policymakers, employers, and job seekers, highlighting the importance of strategic skill development in our increasingly dynamic job market.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS