Date of Award
Doctor of Philosophy
Jens Gregor, Husheng Li, Russell L. Zaretzki
The exponentially growing modern media created large amount of multimodal or multidomain visual data, which usually reside in high dimensional space. And it is crucial to provide not only effective but also efficient understanding of the data.In this dissertation, we focus on learning binary representation of visual dataset, whose primary use has been hash code for retrieval purpose. Simultaneously it serves as multifunctional feature that can also be used for various computer vision tasks. Essentially, this is achieved by discriminative learning that preserves the supervision information in the binary representation.By using deep networks such as convolutional neural networks (CNNs) as backbones, and effective binary embedding algorithm that is seamlessly integrated into the learning process, we achieve state-of-the art performance on several settings. First, we study the supervised binary representation learning problem by using label information directly instead of pairwise similarity or triplet loss. By considering images and associated textual information, we study the cross-modal representation learning. CNNs are used in both image and text embedding, and we are able to perform retrieval and prediction across these modalities. Furthermore, by utilizing unlabeled images from a different domain, we propose to use adversarial learning to connect these domains. Finally, we also consider progressive learning for more efficient learning and instance-level representation learning to provide finer granularity understanding. This dissertation demonstrates that binary representation is versatile and powerful under various circumstances with different tasks.
Liu, Liu, "Binary Representation Learning for Large Scale Visual Data. " PhD diss., University of Tennessee, 2018.