Noise-Resilient Hybrid EfficientNet–Vision Transformer Framework with Adaptive Symmetric Cross-Entropy Loss for Robust Plant Disease Detection

Pradeep Gupta; Rakesh Singh Jadon

doi:10.38094/jastt71614

Vol. 7 No. 1 (2026)

Standard Journal Issues

Noise-Resilient Hybrid EfficientNet–Vision Transformer Framework with Adaptive Symmetric Cross-Entropy Loss for Robust Plant Disease Detection

Published 2026-03-27

Pradeep Gupta
Rakesh Singh Jadon

Pradeep Gupta
Department of CSE, Madhav Institute of Technology & Science, Gwalior (M.P.), India

Rakesh Singh Jadon
Department of CSE, Madhav Institute of Technology & Science, Gwalior (M.P.), India

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

How to Cite

[1]

P. Gupta and R. S. Jadon, “Noise-Resilient Hybrid EfficientNet–Vision Transformer Framework with Adaptive Symmetric Cross-Entropy Loss for Robust Plant Disease Detection”, JASTT, vol. 7, no. 1, pp. 209–222, Mar. 2026, doi: 10.38094/jastt71614.

Download Citation

Abstract

The errors of human annotation and the noise of the environment such as lighting changes, occlusions and cluttered backdrop limit the correct detection of the plant diseases in the field condition. The research hypothesis is to present a robust deep learning model that can withstand noise and be interpretable in controlled and noisy environments to achieve high plant disease classification. The hybrid EfficientNet-Vision Transformer (ViT) network proposed is based on an EfficientNet-B4 branch of CNN and a branch of Vision Transformer (ViT) network, which focuses on capturing fine-grained lesion features and global contexts information. A data augmentation pipeline based on CycleGAN is used to introduce field-style distortions (e.g., (lighting shifts, shadowing, debris and partial occlusions), to be more robust to environmental noise, and an Adaptive Symmetric Cross-Entropy (ASCE) loss identifies and down-weights uncertain samples with normalized prediction entropy. The training is done in two phases, Stage 1 pretraining with clean images of PlantVillage and Stage 2 with increasingly noisy samples. The framework is tested in two different noise conditions, and these include the controlled synthetic label noise with PlantVillage and the real environmental noise with PlantDoc. The proposed model has an accuracy of 94.5% on the clean PlantVillage test set. It achieves 85.0% accuracy on the PlantVillage dataset under the 20% synthetic label noise protocol, outperforming ResNet-50V2 (76.5%), DenseNet-121 (78.9%), and Co-Teaching (79.5%). Macro-precision, macro-recall and macro-F1 of the model on the external PlantDoc field dataset are 0.718, 0.681, 0.681, respectively with a top-1 accuracy of 72.0, which is a manifestation of cross-domain generalization. The lesion-centric Grad-CAM images indicate that the model places emphasis on symptomatic areas of leaves and represses reactions of background soil, shadows, and clutters. The suggested hybrid EfficientNet-ViT architecture offers, in general, a robust and explainable solution to precision agriculture and intelligent crop tracking systems that are resistant to noise.

Keywords

Adaptive Symmetric Cross-Entropy, Deep Learning, Noise-Robust Learning, Vision Transformer

PDF

References

R. C. Ploetz, “Global Impact of Plant Diseases on Food Security,” 2016.
J. G. A. Barbedo, “Plant disease identification from individual lesions and spots using deep learning,” Biosyst. Eng., vol. 180, pp. 96–107, 2019, doi: 10.1016/j.biosystemseng.2019.02.002.
A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agriculture: A survey,” Comput. Electron. Agric., vol. 147, pp. 70–90, 2018, doi: https://doi.org/10.1016/j.compag.2018.02.016.
S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning for image-based plant disease detection,” Front. Plant Sci., vol. 7, no. September, pp. 1–10, Sep. 2016, doi: 10.3389/fpls.2016.01419.
Z. Zhang, Y. Song, and H. Qi, “Noise-resilient training for deep learning in agricultural applications,” IEEE Access, vol. 9, pp. 145004–145015, 2021, doi: 10.1109/ACCESS.2021.3118234.
J. Zhang and J. Zhuang, “The impact of label noise on deep learning-based plant disease recognition,” IEEE Transactions on Automation Science and Engineering, vol. 17, no. 3, pp. 1285–1294, 2020.
S. Ghosal and K. Sarkar, “Leaf image analysis for crop stress detection: A review,” Precis. Agric., vol. 21, no. 4, pp. 1–23, 2020.
Y. Jiang, C. Li, and A. H. Paterson, “Deep learning for high-throughput analysis of field conditions in precision agriculture,” Plant Phenomics, vol. 2020, pp. 1–14, 2020.
K. P. Ferentinos, “Deep learning models for plant disease detection and diagnosis,” Comput. Electron. Agric., vol. 145, pp. 311–318, 2018, doi: 10.1016/j.compag.2018.01.009.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2261–2269, Nov. 2017, doi: 10.1109/CVPR.2017.243.
M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th International Conference on Machine Learning, ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.
E. C. Too, L. Yujian, S. Njuki, and L. Yingchun, “A comparative study of fine-tuning deep learning models for plant disease identification,” Comput. Electron. Agric., vol. 161, pp. 272–279, 2019.
Z. Salman, A. Muhammad, and D. Han, “Plant disease classification in the wild using vision transformers and mixture of experts,” Front. Plant Sci., vol. 16, p. 1522985, Jun. 2025, doi: 10.3389/FPLS.2025.1522985/BIBTEX.
L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “MentorNet: Learning data-driven curriculum for noisy labels,” in Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 2309–2318.
B. Han et al., “Co-teaching: Robust training of deep networks with extremely noisy labels,” in Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 8536–8546.
Y. Wang et al., “Symmetric cross entropy for robust learning with noisy labels,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
C. Shorten and T. M. Khoshgoftaar, “A Survey on Image Data Augmentation for Deep Learning,” J. Big Data, 2019.
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, pp. 2242–2251, Dec. 2017, doi: 10.1109/ICCV.2017.244.
C. Xiaohui, Y. Yongzhi, C. Zhi-bo, X. Cui, Y. Ying, and Z. Chen, “CycleGAN based confusion model for cross-species plant disease image migration,” Journal of Intelligent & Fuzzy Systems, vol. 41, no. 6, pp. 1–12, 2021, doi: 10.3233/JIFS-210585.
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, pp. 2011–2023, 2018, doi: 10.1109/TPAMI.2019.2913372.
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11211 LNCS, 2018, pp. 3–19. doi: 10.1007/978-3-030-01234-2_1.
G. Shandilya, S. Gupta, H. G. Mohamed, S. Bharany, A. U. Rehman, and S. Hussen, “Enhanced Maize Leaf Disease Detection and Classification Using an Integrated CNN-ViT Model,” Food Sci. Nutr., vol. 13, no. 7, p. e70513, Jul. 2025, doi: 10.1002/FSN3.70513.
J. Chen et al., “TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation,” ArXiv, 2021, [Online]. Available: http://arxiv.org/abs/2102.04306
G. Wang, N. Zhang, W. Liu, H. Chen, and Y. Xie, “MFST: A Multi-Level Fusion Network for Remote Sensing Scene Classification,” IEEE Geoscience and Remote Sensing Letters, vol. 19, 2022, doi: 10.1109/LGRS.2022.3205417.
Z. Peng, W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao, and Q. Ye, “Conformer: Local Features Coupling Global Representations for Visual Recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 367–376.
J. Li, R. Socher, and S. C. H. Hoi, “DivideMix: Learning with Noisy Labels as Semi-supervised Learning,” 8th International Conference on Learning Representations, ICLR 2020, Feb. 2020, Accessed: Mar. 10, 2026. [Online]. Available: http://arxiv.org/abs/2002.07394
J. Shu et al., “Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting,” Adv. Neural Inf. Process. Syst., vol. 32, Sep. 2019, Accessed: Mar. 10, 2026. [Online]. Available: http://arxiv.org/abs/1902.07379
S. Murugesan, J. Chinnadurai, S. Srinivasan, S. K. Mathivanan, R. R. Chandan, and U. Moorthy, “Robust multiclass classification of crop leaf diseases using hybrid deep learning and Grad-CAM interpretability,” Sci. Rep., vol. 15, no. 1, pp. 1–22, Dec. 2025, doi: 10.1038/S41598-025-14847-7;SUBJMETA.
M. R. Tonmoy, Md. M. Hossain, N. Dey, and M. F. Mridha, “MobilePlantViT: A Mobile-friendly Hybrid Vision Transformer for Generalized Plant Disease Image Classification,” arXiv preprint arXiv:2503.16628, 2025, [Online]. Available: https://arxiv.org/abs/2503.16628
X. Huang et al., “EConv-ViT: A strongly generalized apple leaf disease classification model based on the fusion of ConvNeXt and Transformer,” Information Processing in Agriculture, Mar. 2025, doi: 10.1016/J.INPA.2025.03.001.
D. Karimi, Q. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis,” Med. Image Anal., vol. 65, p. 101759, 2020, doi: 10.1016/j.media.2020.101759.
C. Galabuzi, H. Abdullah, N. Ahmad, and H. M. Kaidi, “EfficientNet-Based Deep Learning Neural Network for Accurate Plant Disease Detection,” 2024 5th International Conference on Smart Sensors and Application: Shaping the Future of Intelligent Innovation, ICSSA 2024, pp. 1–6, 2024, doi: 10.1109/ICSSA62312.2024.10788558.
A. Dosovitskiy et al., “an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale,” in ICLR 2021 - 9th International Conference on Learning Representations, 2021.
T. Rao and V. Patel, “Exploring Feature Fusion Strategies in Deep Learning Models for Plant Disease Classification,” Journal of Computational Biology and Agriculture, vol. 8, no. 1, pp. 45–58, 2024.
David. P. Hughes and M. Salathe, “An open access repository of images on plant health to enable the development of mobile disease diagnostics,” Nov. 2015, Accessed: Aug. 28, 2025. [Online]. Available: https://arxiv.org/pdf/1511.08060
G. Yilma, Z. Qin, M. Assefa, G. Alemu, and M. Ayalew, “Attention Augmented Convolutional Neural Network for Fine-Grained Plant Disease Classification and Visualization Using Stochastic Sample Transformations,” ACM International Conference Proceeding Series, pp. 13–19, Nov. 2021, doi: 10.1145/3502827.3502836.

Downloads

Download data is not yet available.

Noise-Resilient Hybrid EfficientNet–Vision Transformer Framework with Adaptive Symmetric Cross-Entropy Loss for Robust Plant Disease Detection

Abstract

Keywords

References

Downloads

Similar Articles

Most read articles by the same author(s)

Similar Articles

Hybrid Enhanced Underwater Dark Channel Prior Framework with Vision Transformer Refinement, Mamba-Based Fusion, and Diffusion-Driven Dehazing

A Hybrid CBIR Framework Using Vision Transformers and Genetic Algorithm for Enhanced Image Retrieval

Effective Image and Video Recognition Techniques in Environmental and Earth Monitoring Systems Using Remote-Sensed Intelligent Visual Analytics

Deep Learning-Driven Visual Analytics Framework for Next-Generation Environmental Monitoring

AppleVit: A Smart Agricultural Software for Apple Leaf Disease Detection Using AI

Deep Learning Based Early Detection of Atherosclerosis for Stroke Prevention using Multi-Sensor Data Integration

HSA-CNN: A Hybrid Spectral-Attention Multi-Agent Framework for Explainable Cloud Detection in Multispectral Remote Sensing Imagery

Real-Time Secure Cloud Transmission Framework for Agricultural Surveillance Using YOLOv5, AES-256, and HMAC-SHA256

Integration of IoT and Remote-Sensed Visual Analytics for Smart Environmental Surveillance

An Adaptive State-Augmented Kalman Filter for Robust UAV Altitude Control with Online Sensor Bias Correction and Dynamic Weighting in Degraded Environments