ConvNeXt Meets Vision Transformers: A Powerful Hybrid Framework for Facial Age Estimation

Abstract

Age estimation based on facial images is a challenging task due to the complex and nonlinear nature of facial aging, which is influenced by both genetic and environmental factors. To address this challenge, we propose a hybrid ConvNeXt-Transformer framework that combines convolutional local feature extraction with attention-based global contextual modeling within a unified age regression pipeline. The methodological contribution of this work lies in the sequential integration of these two complementary paradigms for facial age estimation, allowing the model to capture both fine-grained textural cues—such as wrinkles and skin spots—and long-range spatial dependencies. We evaluate the proposed framework on benchmark datasets including MORPH II, CACD, UTKFace, and AFAD. The results show competitive performance across these datasets and confirm the effectiveness of the proposed hybrid design through extensive ablation analyses. Experimental results demonstrate that our approach achieves state-of-the-art MAE on MORPH II (2.26), CACD (4.35), and AFAD (3.09) under the adopted benchmark settings while remaining competitive on UTKFace. To address computational efficiency, we employ ImageNet pre-trained backbones and explore different architectural configurations, including fusion strategies and varying depths of the Transformer module, as well as regularization techniques such as stochastic depth and label smoothing. Ablation studies confirm the contribution of each component, particularly the role of attention mechanisms, in enhancing the model’s sensitivity to age-relevant features. Overall, the proposed hybrid framework provides a robust and accurate solution for facial age estimation, effectively balancing performance and computational cost.

Publication
Applied Sciences
Gaby Maroun
Gaby Maroun
PhD Student

My research focuses on generalizable deep learning methods for computer vision, with applications in segmentation, classification, and visual understanding.

Salah Eddine Bekhouche
Salah Eddine Bekhouche
Former PhD Student

My research focuses on applied computer vision, pattern recognition, machine learning, and deep learning with a deep interest in biometrics, facial analysis, document understanding, and image/video generation.

Fadi Dornaika
Fadi Dornaika
Ikerbasque Research Professor

Ikerbasque Research Professor with expertise in computer vision, machine learning, and pattern recognition.