Multi-View CNN–Transformer Fusion with Self-Distillation for Robust Banana Leaf Disease Detection

Authors:
A. Aruna Devi, S. Venkatasubramanian, H. Hari Prasath

Addresses:
Department of Computer Science and Business Systems, Saranathan College of Engineering, Tiruchirappalli, Tamil Nadu, India. Department of Electronics and Communication Engineering, Saranathan College of Engineering, Tiruchirappalli, Tamil Nadu, India.

Abstract:

An effective deep learning framework for disease detection in banana leaves using real-field imaging conditions is introduced in this paper as M3F-BananaNet. The method enhances dependability by integrating three fundamental concepts: (3) a hybrid CNN-Transformer encoder that captures both fine lesion textures and global streak-like patterns; and (4) a multi-view feature construction that uses RGB appearance, spectral-texture proxy cues, and vein/structure maps. Lastly, the model uses Disease-Aware Cross-Attention Fusion (DACAF) with self-distillation to dynamically weight informative views and produce well-calibrated deployment predictions. Common evaluation metrics such as Accuracy, Precision, Recall, Macro-F1, AUROC, and AUPRC were used in conjunction with PyTorch, a Python framework, and OpenCV/Torchvision for preprocessing. Compared with ResNet50, EfficientNetB0, and MobileNetV3-Large, M3F-BananaNet outperformed them on the held-out test set, achieving 97.1% Accuracy, 96.5% Macro-F1, and strong ranking performance (AUROC 0.993, AUPRC 0.990). Compared with the optimal baseline (EfficientNetB0), class-wise analysis reveals steady improvements; for example, F1 scores for Healthy (0.975), Black Sigatoka (0.955), Cordana (0.945), and Fusarium/Other (0.935) all show gains. While self-distillation enhances calibration and decreases confusion among visually similar diseases, ablation results validate that multi-view inputs and DACAF fusion significantly contribute to robustness. Based on these findings, M3F-BananaNet provides a realistic balance between accuracy and robustness for banana disease screening, ready for field use.

Keywords: Convolutional Neural Networks (CNNs); Banana Leaf; Disease Detection; Lesion Textures; Global Streak-Like Patterns; Deep Learning (DL); Real-field Imaging.

Received on: 02/02/2025, Revised on: 24/05/2025, Accepted on: 01/09/2025, Published on: 05/01/2026

DOI: 10.64091/ATIHL.2026.000263

AVE Trends in Intelligent Health Letters, 2026 Vol. 3 No. 1 , Pages: 27-39

  • 👁 84
  • ⬇ 2
Download PDF