Data mining over mismatched domains
Data mining is the discovery of non-obvious knowledge about a large set of related data. Pattern matching is one data mining technique often used to find nonobvious relationships and associations between data items. The problem addressed by this research is the reconciliation of two data sets of different origin that contain information about the same real-world entities. This research compares the effectiveness of the back-propagation neural network model and the least-squares multiple linear regression model by using each method to recognize when a record in one data set describes the same real-world entity as a record in the other data set. Results of this research indicate that back-propagation can easily over-fit mismatched data but does outperform least-squares approximations when the number of hidden layer neurons is carefully chosen.
Thesis97.A85.pdf_AWSAccessKeyId_AKIAYVUS7KB2IXSYB4XB_Signature_4ThVz_2B6QUqL5CxQkZ_2FEnXZhfU_2BY_3D_Expires_1711729396
3.65 MB
Unknown
843e354980a476494c5b55ee5b43fb3d