Skip to main navigation menu Skip to main content Skip to site footer

Text Classification for Clickbait Detection: A Model-Driven Approach Using CountVectorizer and ML Classifiers

Abstract

In both digital and mainstream media, clickbait headlines are rampant forms of receiving and distributing attention, designed to maximize click through rates which in turn delegitimize content and help spread misinformation. In this paper we propose an efficient and interpretable machine learning framework for the binary classification of clickbait vs non-clickbait headlines using traditional models. Our pipeline involves the preprocessing step, followed by n-gram feature extraction via CountVectorizer and the classification done using Multinomial Naive Bayes, Logistic Regression and XGBoost. The models were trained and evaluated on accuracy, precision, recall, F1 score and ROC AUC, using a publicly available dataset. Our results indicate that Naive Bayes and Logistic Regression models enjoyed better performance with an accuracy of 95.88% and an F1-score of 95.88% and an AUC of 0.99, performing better than the more complex XGBoost classifier. Confirming the ability of lightweight models for real time clickbait detection, we further show that traditional machine learning is also interpretable and scalable.

Keywords

Clickbait Detection, Text Classification, CountVectorizer, Machine Learning, Naive Bayes Classifier

PDF

References

  1. Y.-W. Ma, L.-D. Chen, Y.-M. Huang, and J.-L. Chen, “Intelligent Clickbait News Detection System Based on Artificial Intelligence and Feature Engineering,” IEEE Transactions on Engineering Management, vol. 71, pp. 12509–12518, Jan. 2024, doi: 10.1109/tem.2022.3215709.
  2. A. Diez-Gracia, D. Palau-Sampio, P. Sánchez-García, and I. Sánchez-Sobradillo, “Clickbait Contagion in International Quality Media: Tabloidisation and Information Gap to Attract Audiences,” Social Sciences, vol. 13, no. 8, p. 430, Aug. 2024, doi: 10.3390/socsci13080430.
  3. C. Reuter, A. Lee Hughes, and C. Buntain, “Combating information warfare: state and trends in user-centred countermeasures against fake news and misinformation,” Behaviour & Information Technology, vol. ahead-of-print, no. ahead-of-print, pp. 1–14, Dec. 2024, doi: 10.1080/0144929x.2024.2442486.
  4. D. Surjatmodjo, A. A. Unde, A. F. Sonni, and H. Cangara, “Information Pandemic: A Critical Review of Disinformation Spread on Social Media and Its Implications for State Resilience,” Social Sciences, vol. 13, no. 8, p. 418, Aug. 2024, doi: 10.3390/socsci13080418.
  5. A. Shrestha, A. Behfar, and M. N. Al-Ameen, “‘It is Luring You to Click on the Link With False Advertising’ - Mental Models of Clickbait and Its Impact on User’s Perceptions and Behavior Towards Clickbait Warnings,” International Journal of Human–Computer Interaction, vol. 41, no. 4, pp. 2352–2370, Mar. 2024, doi: 10.1080/10447318.2024.2323248.
  6. A. G. Philipo, D. S. Sarwatt, J. Ding, M. Daneshmand, and H. Ning, “Assessing text classification methods for cyberbullying detection on social media platforms,” arXiv (Cornell University), Dec. 2024, doi: 10.48550/arxiv.2412.19928.
  7. Y.-L. Lin, S.-Y. Lu, and L.-C. Yu, Benchmarking Clickbait Detection from News Headlines. 2024. doi: 10.1109/o-cocosda64382.2024.10800145.
  8. S. Ghosh, S. Jhalani, and C. Oshiro, Topic Modeling-Driven Feature Engineering to Enhance Clickbait Detection in Social Networks. 2024, pp. 1–8. doi: 10.1109/iisa62523.2024.10786672.
  9. S. Ghosh, D. Sen, and S. Ghosh, Enhancing Clickbait Detection with Cross-Modal Topic Modeling in Social Networks. 2024, pp. 289–294. doi: 10.1109/icict62343.2024.00053.
  10. R. S. R, A. Sungheetha, M. A. Haile, A. H. Kedir, R. A, and C. B. G, “Clickbait Detection for Amharic Language using Deep Learning Techniques,” Journal of Machine and Computing, pp. 603–615, Jul. 2024, doi: 10.53759/7669/jmc202404058.
  11. Y. Arfat and S. C. Tista, Bangla Misleading Clickbait Detection Using Ensemble Learning Approach. 2024, pp. 184–189. doi: 10.1109/iceeict62016.2024.10534333.
  12. H. Deng et al., “Prompt-tuning for clickbait detection via text summarization,” arXiv (Cornell University), Apr. 2024, doi: 10.48550/arxiv.2404.11206.
  13. Y. Wang, Y. Zhu, Y. Li, J. Qiang, Y. Yuan, and X. Wu, “Clickbait detection via Prompt-Tuning with titles only,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 9, no. 1, pp. 695–705, Jun. 2024, doi: 10.1109/tetci.2024.3406418.
  14. M. E. Syahputra, A. P. Kemala, F. A. Tjan, and R. Susanto, “Clickbait detection in Indonesia Headline news using BERT ensemble models,” 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pp. 475–479, Dec. 2023, doi: 10.1109/isriti60336.2023.10467417.
  15. T. Natanya and C. Liebeskind, “Clickbait detection in Hebrew,” Lodz Papers in Pragmatics, vol. 19, no. 2, pp. 427–446, Dec. 2023, doi: 10.1515/lpp-2023-0021.
  16. A. A. Imran, M. S. H. Shovon, and M. F. Mridha, “BaitBuster-Bangla: A comprehensive dataset for clickbait detection in Bangla with multi-feature and multi-modal analysis,” Data in Brief, vol. 53, p. 110239, Feb. 2024, doi: 10.1016/j.dib.2024.110239.
  17. D.-M. Broscoteanu and R. T. Ionescu, “A Novel Contrastive Learning Method for Clickbait Detection on ROCLICO: A Romanian Clickbait Corpus of news articles,” arXiv (Cornell University), Jan. 2023, doi: 10.48550/arxiv.2310.06540.
  18. H.-C. Wang, M. Maslim, and H.-Y. Liu, “CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method,” Data Technologies and Applications, vol. 58, no. 2, pp. 243–266, Aug. 2023, doi: 10.1108/dta-03-2023-0072.
  19. M. N. Al-Ali and S. M. Hamzeh, “Extra cues extra views: A multimodal detection of Arabic clickbait thumbnail verbo-visual cues,” Discourse & Communication, vol. 18, no. 1, pp. 3–27, Aug. 2023, doi: 10.1177/17504813231190332
  20. Kaustubh0201, “Clickbait-Classification/clickbait_data.csv at main · kaustubh0201/Clickbait-Classification,” GitHub,2025. https://github.com/kaustubh0201/Clickbait-Classification/blob/main/clickbait_data.csv

Downloads

Download data is not yet available.

Similar Articles

11-20 of 27

You may also start an advanced similarity search for this article.