Accurate white blood cell (WBC) classification in peripheral blood smears is essential for diagnosing and monitoring hematological disorders. Traditionally, Convolutional Neural Networks (CNNs) have been employed, but they often demand large datasets and substantial computational resources. Recently, Vision Transformers (ViTs) have shown promise by using selfattention mechanisms to capture global image dependencies. This study provides a comparative analysis of CNNs and ViTs for WBC classification, assessing their performance using a standardized dataset of peripheral blood smear images. We evaluate accuracy, computational efficiency, and robustness to varying image qualities. Our results indicate that while CNNs perform well in feature extraction, ViTs excel in managing complex patterns and achieving higher classification accuracy with fewer samples. Additionally, we explore the effects of data augmentation and hybrid models that combine CNNs with ViTs. These approaches enhance model generalization and performance, making them promising for clinical applications requiring diverse data and high accuracy. This research advances the understanding of deep learning in medical imaging, highlighting ViTs as a viable alternative to CNNs for WBC classification. Future research will focus on optimizing these models for real-time clinical use and exploring their application in other diagnostic fields.