Klasifikasi Commit Message pada 5 Repository Terpopuler di Ekosistem NPM Menggunakan DistilBERT

Authors

  • Diva Pradani Universitas Muhammadiyah Surakarta
  • Yusuf Nugroho Universitas Muhammadiyah Surakarta

Keywords:

Classification, Commit Message, DistilBERT, GitHub Repository, NPM

Abstract

Github, as a popular collaborative platform, recorded more than 124 million users contributing to over 95 million repositories in October 2023. Commit messages serve as a crucial space for developers to interact and monitor the progress of their projects. However, with the increasing number of contributions and diverse repositories, documentation within commit messages has become increasingly complex. This complexity presents challenges for developers in tracking and analyzing changes in project development. The aim of this research is to classify commit messages into three primary categories: "Corrective," "Perfective," and "Adaptive." This classification is conducted within the NPM ecosystem, which is the largest open-source ecosystem, using the pre-trained transformer model, DistilBERT. This research produces an evaluation matrix showing an accuracy rate of 76,79%, an F1-score of 81,98%, precision of 85,57%, recall of 78,75%, and a hamming loss of 12,50%. These results reflect the model's performance in classifying data effectively and providing a positive contribution to information processing

Published

07-05-2024