Evaluating LDA and LSA for Topic Modeling in the Indonesian Natural Disaster

Authors

  • Muhamad Gatot Supiadin Universitas Amikom Yogyakarta
  • Arif Dwi Laksito Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.33022/ijcs.v12i6.3478

Abstract

Topic Modeling is a method for analyzing topics, documents, and articles in Natural Language Processing. The LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis) algorithms are widely used in topic modelling. This study focuses on analyzing articles related to natural disasters in the Indonesian language. The dataset for this study was obtained through data scraping from Google News, which served as a container for several articles and online news sources. The research method is divided into several stages: dataset scraping, data preprocessing, topic modelling with LDA and LSA, model visualization, and model evaluation. The results of the research show that both algorithms can generate topics on datasets relevant to natural disasters in Indonesia such as floods, earthquakes, landslides, tsunamis and so on. From the evaluation results using Coherence Scores, it is found that LDA has better performance than LSA seen from Coherence Scores in modeling topics related to Natural Disasters in Indonesia. The results of this research are expected to provide convenience for information system developers in terms of monitoring natural disasters in Indonesia and make it easier for researchers who want to effectively read a document based on the disaster related in the document.

Downloads

Published

30-12-2023