SCS-SupCon: Sigmoid-based common and style supervised contrastive learning with adaptive decision boundaries

Abstract

Image classification can be inherently challenging due to subtle inter-class differences and substantial intra-class variations, limiting the effectiveness of existing contrastive learning approaches. In particular, supervised contrastive methods based on the InfoNCE loss often suffer from the negative-sample dilution issue and lack explicit mechanisms for adaptive decision-boundary control, significantly weakening their discriminative capability on fine-grained image classification tasks. To address these challenges, we propose a novel supervised contrastive learning framework, termed Sigmoid-based Common and Style Supervised Contrastive Learning (SCS-SupCon). In this framework, we introduce a sigmoid-based pairwise contrastive loss with adaptive decision boundaries, explicitly parameterized by learnable temperature and bias terms. This design places greater emphasis on critical discriminative information from hard negatives, thereby alleviating the problem of negative-sample dilution while fully leveraging supervision signals in contrastive learning. Furthermore, we incorporate an explicit style-distance constraint to disentangle style and content representations, leading to more robust and discriminative feature learning. Comprehensive experiments on six benchmark datasets, including prominent fine-grained datasets such as CUB200-2011 and Stanford Dogs, consistently demonstrate that our proposed SCS-SupCon achieves superior performance over the most closely related InfoNCE-based supervised contrastive baselines (SupCon, SelfCon, CS-SupCon and its overlapping variant) across diverse CNN and Transformer backbones. In particular, on CIFAR-100 with a ResNet-50 encoder, SCS-SupCon improves the mean top-1 accuracy over SupCon by about 3.9 percentage points and over CS-SupCon by about 1.7 percentage points under a five-fold cross-validation protocol. On challenging fine-grained datasets such as CUB200-2011 and Stanford Dogs with both CNN and Transformer architectures, our method achieves absolute improvements of approximately 0.4–3.0 percentage points over CS-SupCon. Extensive ablation studies and paired statistical tests further confirm the robustness and effectiveness of our framework, and a Friedman test with Nemenyi post-hoc analysis shows that SCS-SupCon attains the best average rank among the evaluated methods, even though pairwise differences with other strong competitors are not always statistically significant at the 0.05 level.

Publication
Expert Systems with Applications
Bin Wang
Bin Wang
PhD Student

My research focuses on deep metric learning for computer vision.

Fadi Dornaika
Fadi Dornaika
Ikerbasque Research Professor

Ikerbasque Research Professor with expertise in computer vision, machine learning, and pattern recognition.