Skip to main navigation menu Skip to main content Skip to site footer

AQUAPHISH: Leveraging Metaheuristics and Automated Machine Learning for Precision Phishing Detection

Abstract

Phishing is an ongoing and dynamic threat in the field of cybersecurity, targeting user trust to capture sensitive data through fraudulent websites. Conventional detection systems tend to use binary classification and static features, which make them less flexible to new attack paradigms. This paper seeks to design a solid and comprehensible phishing detection system that alleviates the drawbacks of binary labeling by proposing a regression-based risk scoring model. The aim is to improve accuracy, feature interpretability, and deployment in real-time settings. The new method combines Whale Optimization Algorithm (WOA) for feature selection and H2O AutoML for model creation and assessment. A filtered dataset of 10,000 phishing and normal websites is operated upon using 48 features, which are then reduced to 36 using WOA. The last models are optimized with H2O AutoML, encompassing ensemble learners, and tested on various regression metrics. Interpretability is achieved with SHAP analysis. The best model had an R² of 0.9534, RMSE of 0.1079, and MSE of 0.0116, better than traditional classification-based phishing detectors. The system, with only 36 features, had training time decreased by 23.6% and inference latency reduced by ~18%, without any sacrifice in detection accuracy (98.3%). Regression-based scoring also supported adaptive threat ranking in real time. By posing phishing detection as a regression problem and integrating metaheuristic feature selection with AutoML, this work introduces a scalable and explainable framework ready for real-world deployment. The low-latency yet high-accuracy model is best suited for integration into browser-level phishing filters and cloud-based threat intelligence platforms.

Keywords

phishing attack, optimization algorithm, whale optimization algorithm, automl framework, automl h2o, regression analysis, random forest algorithm

PDF

References

  1. Sujatha, B., & Porika, S. (2024). Efficient Feature Generation with Modified Whale Optimization Algorithm to Classify the Intrusion Detection. Journal of Computational Analysis & Applications, 33(8).
  2. Ismail, M., Fedutin, I., Hoyt, E., Ivkovich, T., & Filatova, O. (2025). Auto machine learning tools to distinguish between two killer whale ecotypes. Marine Mammal Science, 41(1), e13175.
  3. Gurusamy, B. M., Rangarajan, P. K., & Altalbe, A. (2024). Whale-optimized LSTM networks for enhanced automatic text summarization. Frontiers in artificial intelligence, 7, 1399168.
  4. Braik, M., Awadallah, M., Al-Betar, M. A., & Al-Hiary, H. (2023). Enhanced whale optimization algorithm-based modeling and simulation analysis for industrial system parameter identification. The Journal of Supercomputing, 79(13), 14489-14544.
  5. Gotarane, V., Abimannan, S., Hussain, S., & Irshad, R. R. (2024). A hybrid framework leveraging whale optimization and deep learning with trust-index for attack identification in IoT networks. IEEE Access.
  6. Purwanto, R., Pal, A., Blair, A., & Jha, S. (2021). Man versus Machine: AutoML and Human Experts' Role in Phishing Detection. arXiv preprint arXiv:2108.12193.
  7. Jain, R., Bakare, Y. B., Pattanaik, B., Alaric, J. S., Balam, S. K., Ayele, T. B., & Nalagandla, R. (2023). Optimization of energy consumption in smart homes using firefly algorithm and deep neural networks. Sustainable Engineering and Innovation ISSN 2712-0562, 5(2), 161–176. https://doi.org/10.37868/sei.v5i2.id210
  8. Awasthi, S., Srivastava, P. K., Kumar, N., Ojha, R. P., Pandey, P. S., Singh, R., Gehlot, A., Priyadarshi, N., Jain, R., & Bakare, Y. B. (2023). An epidemic model for the investigation of multi?malware attack in wireless sensor network. IET Communications, 17(11), 1274–1287. https://doi.org/10.1049/cmu2.12622
  9. Gujar, S. S. (2024, December). Machine Learning Algorithms for Detecting Phishing Websites. In 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES) (pp. 1-6). IEEE.
  10. Abbas, S. G., Vaccari, I., Hussain, F., Zahid, S., Fayyaz, U. U., Shah, G. A., ... & Cambiaso, E. (2021). Identifying and mitigating phishing attack threats in IoT use cases using a threat modelling approach. Sensors, 21(14), 4816.
  11. Duh, K., & Zhang, X. (2023, May). AutoML for NLP. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts (pp. 25-26).
  12. Ratner, E., Farmer, E., Warner, B., Douglas, C., & Lendasse, A. (2024). Extreme AutoML: Analysis of Classification, Regression, and NLP Performance. arXiv preprint arXiv:2412.07000.
  13. Salehin, I., Islam, M. S., Saha, P., Noman, S. M., Tuni, A., Hasan, M. M., & Baten, M. A. (2024). AutoML: A systematic review on automated machine learning with neural architecture search. Journal of Information and Intelligence, 2(1), 52-81.
  14. Nadimi-Shahraki, M. H., Zamani, H., Asghari Varzaneh, Z., & Mirjalili, S. (2023). A systematic review of the whale optimization algorithm: theoretical foundation, improvements, and hybridizations. Archives of Computational Methods in Engineering, 30(7), 4113-4159.
  15. Gharehchopogh, F. S., & Gholizadeh, H. (2019). A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm and Evolutionary Computation, 48, 1-24.
  16. LeDell, E., & Poirier, S. (2020, July). H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML (Vol. 2020, p. 24).
  17. Sun, A. Y., Scanlon, B. R., Save, H., & Rateb, A. (2021). Reconstruction of GRACE total water storage through automated machine learning. Water Resources Research, 57(2), e2020WR028666.
  18. Arif, M. K., & Kathirvelu, K. (2024). Automated Driver Health Monitoring System in Automobile Industry Using WOA-DBN Using ECG Waveform. Optical Memory and Neural Networks, 33(3), 308-325.
  19. Sarjerao, J. S., & Sudhagar, G. (2024, April). Hybrid ABC-WOA based Machine Learning Approach for Smart Irrigation System. In 2024 2nd International Conference on Networking and Communications (ICNWC) (pp. 1-8). IEEE.
  20. Liu, W., Guo, Z., Jiang, F., Liu, G., Wang, D., & Ni, Z. (2022). Improved WOA and its application in feature selection. Plos one, 17(5), e0267041.
  21. Al-Farhani, L. H., Alqahtani, Y., Alshehri, H. A., Martin, R. J., Lalar, S., & Jain, R. (2023). IOT and Blockchain-Based Cloud Model for Secure Data Transmission for Smart City. Security and Communication Networks, 2023, 1–10. https://doi.org/10.1155/2023/3171334
  22. Jain, R., Bekuma, Y., Pattanaik, B., Assebe, A., & Bayisa, T. (2022). Design of a Smart Wireless Home Automation System using Fusion of IoT and Machine Learning over Cloud Environment. 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM), 840–847. https://doi.org/10.1109/iciem54221.2022.9853116
  23. Kumar, V., Sharma, R., Goel, S., Satpathy, P. R., & Kumar, R. (2025). WOA Algorithm-Based Optimal Positioning Control for DC Servomotor System. In International Conference on Intelligent Computing and Advances in Communication (pp. 461-470). Springer, Singapore.
  24. Li, Y., Fu, Y., Liu, Y., Zhao, D., Liu, L., Bourouis, S., ... & Wu, P. (2023). An optimized machine learning method for predicting wogonin therapy for the treatment of pulmonary hypertension. Computers in Biology and Medicine, 164, 107293.
  25. Murugan, R., Goel, T., Mirjalili, S., & Chakrabartty, D. K. (2021). WOANet: Whale optimized deep neural network for the classification of COVID-19 from radiography images. Biocybernetics and Biomedical Engineering, 41(4), 1702-1718.
  26. Mohammed, H. M., Umar, S. U., & Rashid, T. A. (2019). A systematic and meta?analysis survey of whale optimization algorithm. Computational intelligence and neuroscience, 2019(1), 8718571.
  27. Do, N. Q., Selamat, A., Krejcar, O., Herrera-Viedma, E., & Fujita, H. (2022). Deep learning for phishing detection: Taxonomy, current challenges and future directions. Ieee Access, 10, 36429-36463.
  28. Mohammed, M. A., Ibrahim, D. A., & Salman, A. O. (2021). Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language. Journal of Intelligent Systems, 30(1), 774-792..
  29. Sabharwal, N., & Agrawal, A. (2021). Up and Running Google AutoML and AI Platform: Building Machine Learning and NLP Models Using AutoML and AI Platform for Production Environment (English Edition). BPB Publications.
  30. Lauriola, I., Lavelli, A., & Aiolli, F. (2022). An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443-456.
  31. Li, W., Manickam, S., Chong, Y. W., Leng, W., & Nanda, P. (2024). A State-of-the-art Review on Phishing Website Detection Techniques. IEEE Access.
  32. Zhang, F., Yang, J., Guo, Y., & Gu, H. (2020, November). Multi-source heterogeneous and XBOOST vehicle sales forecasting model. In International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy (pp. 340-347). Cham: Springer International Publishing.
  33. Wu, D., Guan, Q., Fan, Z., Deng, H., & Wu, T. (2022). AutoML with parallel genetic algorithm for fast hyperparameters optimization in efficient IoT time series prediction. IEEE Transactions on Industrial Informatics, 19(9), 9555-9564.
  34. Ahmed, O. (2024). Enhancing Intrusion Detection in Wireless Sensor Networks through Machine Learning Techniques and Context Awareness Integration. International Journal of Mathematics, Statistics, and Computer Science, 2, 244–258. https://doi.org/10.59543/ijmscs.v2i.10377.
  35. Jain, R., & Varshney, M. (2023). A Critical study on group key management protocols and security aspects for Non-Networks. Journal of Applied Engineering and Technological Science (JAETS), 4(2), 783–794. https://doi.org/10.37385/jaets.v4i2.1947
  36. Balam, S. K., Jain, R., Alaric, J. S., Pattanaik, B., & Ayele, T. B. (2023). Renewable Energy Integration of IoT Systems for Smart Grid Applications. 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC), 374–379. https://doi.org/10.1109/icesc57686.2023.10193428
  37. Dehaerne, E., Dey, B., Blanco, V., & Davis, J. (2025). Scanning electron microscopy-based automatic defect inspection for semiconductor manufacturing: a systematic review. Journal of Micro/Nanopatterning, Materials, and Metrology, 24(2), 020901-020901.
  38. Ferreira, L., Pilastri, A., Martins, C. M., Pires, P. M., & Cortez, P. (2021, July). A comparison of AutoML tools for machine learning, deep learning and XGBoost. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
  39. Jääskeläinen, J. A. (2022). AutoML performance in model fitting: a comparative study of selected machine learning competitions in 2012-2019.
  40. Raj, R., Kannath, S. K., Mathew, J., & Sylaja, P. N. (2023). AutoML accurately predicts endovascular mechanical thrombectomy in acute large vessel ischemic stroke. Frontiers in Neurology, 14, 1259958.

Downloads

Download data is not yet available.

Similar Articles

11-20 of 45

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)