Towards Dynamic Self-Training for Scalable Semi-Supervised Learning on Graphs

Abstract

In the realm of graph-based semi-supervised learning (GSSL), traditional methodologies often struggle to effectively handle labeled samples and scale to accommodate large datasets. To increase supervision information in semi-supervised learning, the self-training paradigm is often used, mainly in datasets with moderate sizes. On the other hand, the use of anchors was adopted with large datasets. In this research endeavor, we propose a novel framework for GSSL that leverages a novel self-training principle tailored for very large datasets, and introduces an advanced method for automatic graph construction using anchors. Our approach focuses on utilizing generated labels of random batches of unlabeled samples, subsequently incorporating these predictions into the training set to enhance the model’s accuracy. Pseudo-labeling, a specific instance of self-training, assigns pseudo-labels to the most confidently predicted unlabeled examples, treating them as ground truth during the training phase. By constructing anchor-to-anchor affinity graphs that incorporate both feature and label information, our method facilitates robust learning on large-scale datasets. Through comprehensive experimentation across diverse large datasets, our approach demonstrates its efficacy in achieving scalable and reliable semi-supervised learning outcomes. These findings represent a significant advancement in the field of GSSL, with wide-ranging implications for various applications across different domains. Our method not only addresses the scalability issue but also ensures the effective integration of both labeled and pseudo labeled data, thereby enhancing the overall learning process.

Publication
Neurocomputing
Fadi Dornaika
Fadi Dornaika
Ikerbasque Research Professor

Ikerbasque Research Professor with expertise in computer vision, machine learning, and pattern recognition.

Zoulfikar Ibrahim
Zoulfikar Ibrahim
Professor and Software Engineer

I develop scalable graph-based machine learning methods and teach cutting-edge technologies in web and software development. My focus spans semi-supervised learning, data analysis, and full-stack engineering.