Date of Award


Degree Type


Degree Name

Doctor of Philosophy


Civil Engineering

Major Professor

Lee Han

Committee Members

Chris Cherry, Hairong Qi, Hyun Kim


Massive data from different sources are becoming available in transportation field, and spurring new research on utilizing these data to nurture new intelligent transportation information systems. Clustering algorithms are among the methods that are being applied to the domain but facing challenges. Classical clustering algorithms work fine with \point" based data, which are homogeneous and have no extra constraints. Data in transportation are sometimes involved with specific geometric shapes, have underlying constraints, and can be heterogeneous. There has been no clustering algorithm dedicated to these situations.

In this dissertation, we re-examine the mathematical foundation and underlying philosophy of hierarchical, density based, centroid-based clustering algorithms, and reformulate them to incorporate physical information to solve a big variety of transportation problems. In particular, we first show an example that a density-based data-driven geohash method can gain 40 seconds accuracy in ETA prediction. We then design a network space density-based clustering algorithm, Dijk-DBSCAN, which expands density based clustering from n-dimensional space to a transportation network space. We will show that Dijk-DBSCAN makes accident and other types of hotspot detection more accurate. Further, we present an online step-wise regression based clustering strategy, to collect vehicles' movement trajectory at low storage cost while maintaining high accuracy. We also explore clustering over arbitrary geometric shapes, and develop a hierarchical clustering framework. This algorithm can be applied to allocate resources over large road networks with an energy-efficiency constraint. In the last section, we present a hierarchical clustering and greedy algorithm that solves general vehicle routing problems, with pickup and delivery, and with time windows. It handles mutually constrained location and time information by clustering orders (e.g. orders of package delivery, passengers' ride requests) and vehicles. The simple implementation and light computational cost make it superior to traditional optimization solvers, and enables real-time and large scale deployment in applications such as city-wide ride-sharing, time constrained package delivery, etc. Our work bridges the gap between classical clustering algorithms and specific data types and problem configurations in transportation domain. All our algorithms are designed to be highly computational efficient, and easily adaptable to other similar problems.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."