A Review of Text Classification Based on ML & Data Mining Algorithms

Authors

  • Ashraf Atam Mustafa Akre University for Applied Science, Technical College of Informatics, Akre, Department of Information Technology, Akre, Kurdistan Region, Iraq
  • Adnan Mohsin Abdulazeez Presidency of Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq

Abstract

In the digital era, the field of text classification has experienced transformative growth through the application of Machine Learning (ML) and Data Mining (DM) algorithms. This review traces the evolution from traditional data mining methods to sophisticated ML strategies that significantly enhance the analysis and categorization of textual data. We discuss pivotal technologies including Bayesian classifiers, Support Vector Machines (SVM), and contemporary advances such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The integration of Natural Language Processing (NLP) techniques is highlighted for their critical role in enriching semantic analysis capabilities, a necessity for effective text classification. Additionally, the paper addresses challenges like handling high-dimensional data, dealing with imbalanced datasets, and confronting ethical issues such as bias and privacy in automated systems. By synthesizing the latest research, this review identifies current gaps, proposes practical solutions, and forecasts future trends in text classification to support ongoing research and application across various sectors.

Published

15-06-2024