PhD defence of Mohammadreza Sadeghi – Unsupervised Representation Learning for Data Clustering
Abstract
Clustering is crucial in pattern recognition and machine learning for extracting key information from unlabeled data. Deep learning-based clustering methods have proven effective in image segmentation, social network analysis, face recognition, and machine vision.
Traditional deep clustering methods seek a single global embedding for all data clusters. In Section 3.1, we introduce a deep multirepresentation learning (DML) framework, where each challenging data group has its own optimized latent space, while easy-to-cluster groups share a common latent space. Autoencoders generate these latent spaces, and a novel loss function with weighted reconstruction and clustering losses emphasizes samples likely belonging to their clusters. DML is published in IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
In Section 3.2, we introduce a novel deep clustering framework with self-supervision using pairwise data similarities (DCSS). DCSS tackles two main challenges in DML: the computational expense of using multiple deep networks and the neglect of pairwise data similarity in its loss function. DCSS has two phases. First, we form hypersphere-like groups of similar samples using a cluster-specific loss function for a single autoencoder, creating these hyperspheres in the autoencoder's latent space. Second, we use pairwise data similarities to create a $K$-dimensional space to handle more complex cluster distributions, improving clustering accuracy. Here, $K$ is the number of clusters. The latent space from the first phase serves as the input for the second phase. Portions of the DCSS results were published in the International Joint Conference on Neural Networks.
In Section 3.3, we extend our DCSS framework to develop Contrastive Clustering (CC) leveraging pairwise similarity. CC models create positive and negative pairs for each data instance via data augmentation to learn a feature space grouping instance-level and cluster-level representations. Existing algorithms often overlook cross-instance patterns, crucial for improving clustering accuracy. In Section 3.3, we introduce Cross-instance guided Contrastive Clustering (C3), a method incorporating cross-sample relationships to increase positive pairs and reduce false negatives, noise, and anomalies. Our new loss function identifies similar instances based on instance-level representations and encourages their aggregation. We also propose a novel weighting method to select negative samples more efficiently. The C3 methodology is published in the 34th British Machine Vision Conference.
In Section 3.4, we leverage our contrastive clustering expertise to develop a novel approach for streaming data, where data arrives sequentially and previous data is inaccessible. Unsupervised Continual Learning (UCL) enables neural networks to learn tasks sequentially without labels. Catastrophic Forgetting (CF), where models forget previous tasks upon learning new ones, is a significant challenge, especially in UCL without labeled data. CF mitigation strategies like knowledge distillation and replay buffers face memory inefficiency and privacy issues. Current UCL research addresses CF but lacks algorithms for unsupervised clustering. To fill this gap, we introduce Unsupervised Continual Clustering (UCC) and propose Forward-Backward Knowledge Distillation for Continual Clustering (FBCC) to counteract CF. FBCC employs a single continual learner (the "teacher") with a cluster projector and multiple student models. It has two phases: Forward Knowledge Distillation, where the teacher learns new clusters while retaining previous knowledge with guidance from specialized students, and Backward Knowledge Distillation, where a student model mimics the teacher to retain task-specific knowledge, aiding the teacher in subsequent tasks.