Comparative Analysis of XGBoost Performance for Text Classification with CPU Parallel and Non-Parallel Processing

Omar Ahmed Al-Zakhali; Subhi  Zeebaree; Shavan  Askar

doi:10.33022/ijcs.v13i2.3798

Authors

Omar Ahmed Al-Zakhali Duhok Polytechnic University
Subhi Zeebaree Duhok Polytechnic University
Shavan Askar Erbil Polytechnic University

DOI:

https://doi.org/10.33022/ijcs.v13i2.3798

Abstract

This paper shows the findings of a study that looks at how CPU parallel processing changes the way Extreme Gradient Boosting (XGBoost) classifies text. XGBoost models can sort news stories into set groups faster and more accurately, with or without CPU parallelism. This is the main goal of the study. The Keras dataset is used to prepare the text so that the TF-IDF (Term Frequency-Inverse Document Frequency) features can be found. These features will then be used to train the XGBoost model. This is used to check out two different kinds of the XGBoost classifier. There is parallelism between one of them and not it in the other. How well the model works can be observed by how accurate it is. This includes both how long it takes to learn and estimate and how well predictions work. The models take very different amounts of time to compute, but they are all pretty close in terms of how accurate they are. Parallel processing on the CPU has made tasks proceed more rapidly, and XGBoost is now better at making the most of that speed to do its task. The purpose of the study is to show that parallel processing can speed up XGBoost models without affecting their accuracy. This is helpful for putting text into categories.