Integrating ConvNeXt and vision transformers for enhancing facial age estimation

Abstract

Age estimation from facial images is a complex and multifaceted challenge in computer vision. In this study, we present a novel hybrid architecture that combines ConvNeXt, a state-of-the-art advancement of convolutional neural networks (CNNs), with Vision Transformers (ViT). While each model independently delivers excellent performance on a variety of tasks, their integration leverages the complementary strengths of the CNNs’ localized feature extraction capabilities and the Transformers’ global attention mechanisms. Our proposed ConvNeXt-ViT hybrid solution was thoroughly evaluated on benchmark age estimation datasets, including MORPH II, CACD, and AFAD, and achieved superior performance in terms of mean absolute error (MAE). To address computational constraints, we leverage pre-trained models and systematically explore different configurations, using linear layers and advanced regularization techniques to optimize the architecture. Comprehensive ablation studies highlight the critical role of individual components and training strategies, and in particular emphasize the importance of adapted attention mechanisms within the CNN framework to improve the model’s focus on age-relevant facial features. The results show that the ConvNeXt-ViT hybrid not only outperforms traditional methods, but also provides a robust foundation for future advances in age estimation and related visual tasks. This work underscores the transformative potential of hybrid architectures and represents a promising direction for the seamless integration of CNNs and transformers to address complex computer vision challenges.

Publication
Computer Vision and Image Understanding
Gaby Maroun
Gaby Maroun
PhD Student

My research focuses on generalizable deep learning methods for computer vision, with applications in segmentation, classification, and visual understanding.

Salah Eddine Bekhouche
Salah Eddine Bekhouche
Former PhD Student

My research focuses on applied computer vision, pattern recognition, machine learning, and deep learning with a deep interest in biometrics, facial analysis, document understanding, and image/video generation.

Fadi Dornaika
Fadi Dornaika
Ikerbasque Research Professor

Ikerbasque Research Professor with expertise in computer vision, machine learning, and pattern recognition.