In the realm of graph-based semi-supervised learning (GSSL), traditional methodologies often struggle to effectively handle labeled samples and scale to accommodate large datasets. To increase supervision information in semi-supervised learning, the self-training paradigm is often used, mainly in datasets with moderate sizes. On the other hand, the use of anchors was adopted with large datasets. In this research endeavor, we propose a novel framework for GSSL that leverages a novel self-training principle tailored for very large datasets, and introduces an advanced method for automatic graph construction using anchors. Our approach focuses on utilizing generated labels of random batches of unlabeled samples, subsequently incorporating these predictions into the training set to enhance the model’s accuracy. Pseudo-labeling, a specific instance of self-training, assigns pseudo-labels to the most confidently predicted unlabeled examples, treating them as ground truth during the training phase. By constructing anchor-to-anchor affinity graphs that incorporate both feature and label information, our method facilitates robust learning on large-scale datasets. Through comprehensive experimentation across diverse large datasets, our approach demonstrates its efficacy in achieving scalable and reliable semi-supervised learning outcomes. These findings represent a significant advancement in the field of GSSL, with wide-ranging implications for various applications across different domains. Our method not only addresses the scalability issue but also ensures the effective integration of both labeled and pseudo labeled data, thereby enhancing the overall learning process.