Date of Award


Degree Type


Degree Name

Doctor of Philosophy


Industrial Engineering

Major Professor

Xueping Li

Committee Members

Rapinder Sawhney, Frank M. Guess, Xiaoyan Zhu


With the success of Revenue Management (RM) techniques over the past three decades in various segments of the service industry, many manufacturing firms have started exploring innovative RM technologies to improve their profits. This dissertation studies RM for make-to-order (MTO) and make-to-stock (MTS) systems.

We start with a problem faced by a MTO firm that has the ability to reject or accept the order and set prices and lead-times to influence demands. The firm is confronted with the problem to decide, which orders to accept or reject and trade-off the price, lead-time and potential for increased demand against capacity constraints, in order to maximize the total profits in a finite planning horizon with deterministic demands. We develop a mathematical model for this problem. Through numerical analysis, we present insights regarding the benefits of price customization and lead-time flexibilities in various demand scenarios.

However, the demands of MTO firms are always hard to be predicted in most situations. We further study the above problem under the stochastic demands, with the objective to maximize the long-run average profit. We model the problem as a Semi-Markov Decision Problem (SMDP) and develop a reinforcement learning (RL) algorithm-Q-learning algorithm (QLA), in which a decision agent is assigned to the machine and improves the accuracy of its action-selection decisions via a “learning" process. Numerical experiment shows the superior performance of the QLA.

Finally, we consider a problem in a MTS production system consists of a single machine in which the demands and the processing times for N types of products are random. The problem is to decide when, what, and how much to produce so that the long-run average profit. We develop a mathematical model and propose two RL algorithms for real-time decision-making. Specifically, one is a Q-learning algorithm for Semi-Markov decision process (QLS) and another is a Q-learning algorithm with a learning-improvement heuristic (QLIH) to further improve the performance of QLS. We compare the performance of QLS and QLIH with a benchmarking Brownian policy and the first-come-first-serve policy. The numerical results show that QLIH outperforms QLS and both benchmarking policies.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."