Pengaruh Jumlah Record Dataset Terhadap Algoritma Klasifikasi Berdasarkan Data Customer Churn
Abstract
Telecommunication is one of the fastest growing industrial sectors so that there are more telecommunication companies. This can create various threats if the company does not use the strategy properly. Customer churn refers to the level of customer reduction which is one of the threats to reducing the company's revenue. This is an important issue for developing companies to evaluate in order to reduce the potential for churn that occurs. The initial stage that needs to be done is to predict customers who have the potential to switch from the company, one of which is the data mining approach. Classification is a data mining technique that can predict the class of datasets with various existing classification algorithms. The purpose of this study is to identify the effect of the number of dataset records on several classification algorithms. This research was conducted based on the CRISP-DM method by applying three classification algorithms, namely Logistic Regression, Naïve Bayes, and Decision Tree C4.5. The results showed that the greater the number of records in the dataset, the higher the accuracy value will be obtained. In dataset-1, logistic regression is a better algorithm based on an accuracy value of 80.09%, while naïve Bayes is superior based on an AUC value of 0.733 and an execution time of 0.00798 seconds. In dataset-2, it is found that decision tree is an algorithm that is more suitable than logistic regression and naïve Bayes algorithms, with an accuracy of 91.9% and an AUC value of 0.846 which is included in the good classification criteria. However, in execution time, the naïve Bayes algorithm only takes a processing time of 0.00403 seconds.
Downloads
References
A. K. Ahmad, A. Jafar, and K. Aljoumaa, “Customer churn prediction in telecom using machine learning in big data platform,” Journal of Big Data, vol. 6, no. 1, 2019.
V. Kavitha, G. H. Kumar, S. V. M. Kumar, and M. Harish, “Churn Prediction of Customer in Telecom Industry using Machine Learning Algorithms,” International Journal of Engineering Research and Technology, vol. 9, no. 5, pp. 181–184, 2020.
J. Pamina et al., “An Effective Classifier for Predicting Churn in Telecommunication,” Journal of Advanced Research in Dynamical and Control Systems, vol. 11, no. 1, pp. 221–229, 2019.
Y. He, Y. Xiong, and Y. Tsai, “Machine Learning Based Approaches to Predict Customer Churn for an Insurance Company,” Systems and Information Engineering Design Symposium, pp. 1–6, 2020.
I. A. Nikmatun and I. Waspada, “Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,” Jurnal SIMETRIS, vol. 10, no. 2, pp. 421–432, 2019.
M. A. Macmudi, “Uji Pengaruh Karakteristik Dataset,” Journal of Computer, Information System, & Technology Management, vol. 1, no. 2, pp. 7–11, 2018.
N. Sagala and H. Tampubolon, “Komparasi Kinerja Algoritma Data Mining pada Dataset Konsumsi Alkohol Siswa,” Khazanah Inform. Jurnal Ilmu Komputer dan Informatika, vol. 4, no. 2, pp. 98–103, 2018.
N. W. Wardani and N. K. Ariasih, “Analisa Komparasi Algoritma Decision Tree C4 . 5 dan Naïve Bayes untuk Prediksi Churn Berdasarkan Kelas Pelanggan Retail,” International Journal of Natural Sciences and Engineering, vol. 3, no. 3, pp. 103–112, 2019.
N. Yahya and A. Jananto, “Komparasi Kinerja Algoritma C.45 Dan Naive Bayes Untuk Prediksi Kegiatan Penerimaan mahasiswa Baru (Studi Kasus : Universitas Stikubank Semarang),” Prosiding SENDI, pp. 221–228, 2019.
A. R. Wibowo and A. Jananto, “Implementasi Data Mining Metode Asosiasi Algoritma FP-Growth Pada Perusahaan Ritel,” Jurnal Teknologi Informasi dan Komunikasi, vol. 10, no. 2, pp. 200–212, 2020.
C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021.
E. P. A. Akhmad, “Data Mining Menggunakan Regresi Linear untuk Prediksi Harga Saham Perusahaan Pelayaran,” Jurnal Aplikasi Pelayaran dan Kepelabuhanan, vol. 10, no. 2, pp. 120–131, 2020.
K. A. Pratama, G. A. Pradnyana, and I. K. R. Arthana, “Pengembangan Sistem Cerdas Untuk Prediksi Daftar Kembali Mahasiswa Baru Dengan Metode Naive Bayes (Studi Kasus: Universitas Pendidikan Ganesha),” SINTECH (Science and Information Technology) Journal, vol. 3, no. 1, pp. 22–34, 2020.
I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,” Jurnal Pilar Nusa Mandiri, vol. 14, no. 2, pp. 217–224, 2018.
A. Purwanto, A. Primajaya, and A. Voutama, “Penerapan Algoritma C4.5 dalam Prediksi Potensi Tingkat Kasus Pneumonia di Kabupaten Karawang,” Jurnal Sistem dan Teknologi Informasi, vol. 8, no. 4, pp. 390–396, 2020.
A. Wicaksono, Anita, and T. N. Padilah, “Uji Performa Teknik Klasifikasi untuk Memprediksi Customer Churn,” Bianglala Informatika, vol. 9, no. 1, pp. 37–45, 2021.
I. Riadi, R. Umar, and F. D. Aini, “Analisis Perbandingan Detection Traffic Anomaly Dengan Metode Naive Bayes Dan Support Vector Machine (SVM),” ILKOM Jurnal Ilmiah, vol. 11, no. 1, pp. 17–24, 2019.
H. Annur, “Klasifikasi Masyarakat Miskin Menggunakan Metode Naive Bayes,” ILKOM Jurnal Ilmiah, vol. 10, no. 2, pp. 160–165, 2018.
E. Budiman, Haviluddin, N. Dengan, A. H. Kridalaksana, M. Wati, and Purnawansyah, “Performance of Decision Tree C4.5 Algorithm in Student Academic Evaluation,” International Conference on Computational Science and Technology, pp. 380–389, 2017.
A. Nurmasani and Y. Pristyanto, “Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class,” Jurnal Pseudocode, vol. 8, no. 1, pp. 21–26, 2021.
A. Bisri and R. Rachmatika, “Integrasi Gradient Boosted Trees dengan SMOTE dan Bagging untuk Deteksi Kelulusan Mahasiswa,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), vol. 8, no. 4, pp. 309–314, 2019.
Copyright (c) 2021 Jurnal Ilmiah Informatika
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.