Metode Pembobotan Jarak dengan Koefisien Variasi untuk Mengatasi Kelemahan Euclidean Distance pada Algoritma k-Nearest Neighbor

Agustiyar Agustiyar, Romi Satria Wahono, Catur Supriyanto

Abstract


k-Nearest Neighbor (k-NN) is one of the classification algorithms which becomes top 10 in data mining. k-NN is simple and easy to apply. However, the classification results are greatly influenced by the scale of the data input. All of its attributes are considered equally important by Euclidean distance, but inappropriate with the relevance of each attribute. Thus, it makes classification results decreased. Some of the attributes are more or less relevance or, in fact, irrelevant in determining the classification results. To overcome the disadvantage of k-NN, Zolghadri, Parvinnia, and John proposed Weighted Distance Nearest Neighbor (WDNN) having the performance better than k-NN. However, when the result is k >1, the performance decrease. Gou proposed Dual Distance Weighted Voting k-Nearest Neighbor (DWKNN) having the performance better than k-NN. However, DWKNN focused in determining label of classification result by weighted voting. It applied Euclidean distance without attribute weighting. This might cause all attribute considered equally important by Euclidean distance, but inappropriate with the relevance of each attribute, which make classification results decreased. This research proposed Coefficient of Variation Weighting k-Nearest Neighbor (CVWKNN) integrating with MinMax normalization and weighted Euclidean distance. Seven public datasets from UCI Machine Learning Repository were used in this research. The results of Friedman test and Nemenyi post hoc test for accuracy showed CVWKNN had better performance and significantly different compared to k-NN algorithm.

 


Keywords


k-NN; attribute weighting; weighted Euclidean distance; MinMax normalization

Full Text:

PDF

References


Akhil, M, B L Deekshatulu, and Priti Chandra. 2013. “Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm.” In Procedia Technology, 10:85–94. Kalyani, Nadia, West Bengal, September 27-28: Elsevier B.V.

Bechar, Avital, and Gad Vitner. 2009. “A Weight Coefficient of Variation Based Mathematical Model to Support the Production of ‘ Packages Labelled by Count ’ in Agriculture.” Biosystems Engineering 104 (3). IAgrE: 362–68. doi:10.1016/j.biosystemseng.2009.08.003.

Cao, Qinghua, and Yu Liu. 2010. “A KNN Classifier with PSO Feature Weight Learning Ensemble.” In International Conference on Intelligent Control and Information Processing, 110–14. Dalian, Aug 13-15.

Christian Gratia Nugroho, Didik Nugroho, Sri Hariyati Fitriasih. 2015. “Sistem Pendukung Keputusan Untuk Pemilihan Metode Kontrasepsi PADA Pasangan Usia Subur Dengan Algoritna K-Nearset Neighbor (K-KN).” Jurnal Ilmiah SINUS 13 (1): 21–30. doi:10.2473/shigentosozai1953.83.947_421.

Da, H, K E Say, I Yenido, S Albayrak, and C Acar. 2012. “Comparison of Feature Selection Algorithms for Medical Data.” In 2012 International Symposium on Innovations in Intelligent Systems and Applications. Trabzon, July 2-4.

Demsar, Janez. 2006. “Statistical Comparisons of Classifiers over Multiple Data Sets.” Journal of Machine Learning Research 7 7: 1–30.

Dialameh, Maryam, and Mansoor Zolghadri Jahromi. 2016. “Proposing a General Feature Weighting Function.” Expert Systems With Applications. Elsevier Ltd. doi:10.1016/j.eswa.2016.12.016.

Dudani, Sahibsingh A. 1976. “The Distance-Weighted K-Nearest-Neighbor Rule.” IEEE Transactions on Systems, Man and Cybernetics SMC-6 (4): 325–27. doi:10.1109/TSMC.1976.5408784.

Gou, Jianping, Mingying Luo, and Taisong Xiong. 2011. “Improving K-Nearest Neighbor Rule with Dual Weighted Voting for Pattern Classification.” Communications in Computer and Information Science 159 CCIS (PART 2): 118–23. doi:10.1007/978-3-642-22691-5_21.

Han, Jiawei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier. Second Edi. Vol. 12. Elsevier Inc. doi:10.1007/978-3-642-19721-5.

Hocke, Jens, and Thomas Martinetz. 2013. “Feature Weighting by Maximum Distance Minimization.” In International Conference on Artificial Neural Networks, 420–25. Bulgaria, September 10-13.

———. 2015. “Maximum Distance Minimization for Feature Weighting.” Pattern Recognition Letters 52. Elsevier Ltd.: 48–52. doi:10.1016/j.patrec.2014.10.003.

Hu, Qinghua, Pengfei Zhu, Yongbin Yang, and Daren Yu. 2011. “Large-Margin Nearest Neighbor Classifiers via Sample Weight Learning.” Neurocomputing 74 (4). Elsevier: 656–60. doi:10.1016/j.neucom.2010.09.006.

Jain, Anil, Karthik Nandakumar, and Arun Ross. 2005. “Score Normalization in Multimodal Biometric Systems.” Pattern Recognition 38: 2270–85. doi:10.1016/j.patcog.2005.01.012.

Jiang, Liangxiao, Zhihua Cai, Dianhong Wang, and Siwei Jiang. 2007. “Survey of Improving K-Nearest-Neighbor for Classification.” In Proceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, 1:679–83. Haikou, Aug 24-27. doi:10.1109/FSKD.2007.552.

Jiao, Lianmeng, Quan Pan, and Xiaoxue Feng. 2015. “Multi-Hypothesis Nearest-Neighbor Classifier Based on Class-Conditional Weighted Distance Metric.” Neurocomputing 151 (P3). Elsevier: 1468–76. doi:10.1016/j.neucom.2014.10.039.

Jiawei, H, Micheline Kamber, Jiawei Han, Micheline Kamber, and Jian Pei. 2012. Data Mining: Concepts and Techniques. San Francisco, CA, Itd: Morgan Kaufmann. Third Edit. Elsevier Inc. doi:10.1016/B978-0-12-381479-1.00001-0.

Ke, N. 2018. “Region Based Segmentation of Social Images Usinng Soft KNN Algorithhm.” In Procedia Computer Science, 125:93–98. Kurukshetra, December 7-8: Elsevier B.V. doi:10.1016/j.procs.2017.12.014.

Mateos-García, Daniel, Jorge García-Gutiérrez, and José C. Riquelme-Santos. 2017. “On the Evolutionary Weighting of Neighbours and Features in the K-Nearest Neighbour Rule.” Neurocomputing 0: 1–7. doi:10.1016/j.neucom.2016.08.159.

Nawi, Nazri Mohd, Walid Hasen Atomi, and M Z Rehman. 2013. “The Effect of Data Pre-Processing on Optimized Training of Artificial Neural Networks.” In Procedia Technology, 11:32–39. Selangor, Jun 24-25: Elsevier B.V. doi:10.1016/j.protcy.2013.12.159.

Neo, Toh Koon Charlie, and Dan Ventura. 2012. “A Direct Boosting Algorithm for the K-Nearest Neighbor Classifier via Local Warping of the Distance Metric.” Pattern Recognition Letters 33 (1). Elsevier B.V.: 92–102. doi:10.1016/j.patrec.2011.09.028.

Raymundus, Nandy Irawan;, Wawan Laksito; YS, and Sri Siswanti. 2013. “ISSN : 1693-1173 Sistem Pendukung Keputusan Untuk Menentukan Status Prestasi Siswa Menggunakan Metode K- Nearest Neighbor Raymundus Nandy Irawan, Wawan Laksito YS., Sri Siswanti.” Jurnal Ilmiah SINUS 11 (2): 53–66.

Ren, Shuhua, and Alin Fan. 2011. “K -Means Clustering Algorithm Based On Coefficient Of Variation.” In 2011 4th International Congress on Image and Signal Processing, 2076–79. Shanghai, October 15-17.

Schenatto, Kelyn, Eduardo Godoy De Souza, Claudio Leones Bazzi, Alan Gavioli, Nelson Miguel, and Humberto Martins. 2017. “Normalization of Data for Delineating Management Zones.” Computers and Electronics in Agriculture 143 (November). Elsevier: 238–48. doi:10.1016/j.compag.2017.10.017.

Shabani, Ali, Keramat Allah, Ali Reza, and Ali Akbar Kamgar-haghighi. 2017. “Using the Artificial Neural Network to Estimate Leaf Area.” Scientia Horticulturae 216. Elsevier B.V.: 103–10. doi:10.1016/j.scienta.2016.12.032.

Siminski, Krzysztof. 2017. “Fuzzy Weighted C-Ordered Means Clustering Algorithm.” Fuzzy Sets and Systems 1. Elsevier B.V.: 1–33. doi:10.1016/j.fss.2017.01.001.

Song, Yunsheng, Jiye Liang, Jing Lu, and Xingwang Zhao. 2017. “An Efficient Instance Selection Algorithm for K Nearest Neighbor Regression.” Neurocomputing. Elsevier B.V. doi:10.1016/j.neucom.2017.04.018.

Tzortzis, Grigorios, and Aristidis Likas. 2014. “The MinMax K -Means Clustering Algorithm.” Pattern Recognition 47 (7). Elsevier: 2505–16. doi:10.1016/j.patcog.2014.01.015.

Venugopal, Vivek, and Suresh Sundaram. 2017. “An Online Writer Identification System Using Regression-Based Feature Normalization and Codebook Descriptors.” Expert Systems With Applications 72. Elsevier Ltd: 196–206. doi:10.1016/j.eswa.2016.11.038.

Wahono, Romi Satria, Nanna Suryana Herman, and Sabrina Ahmad. 2014. “A Comparison Framework of Classification Models for Software Defect Prediction.” Advanced Science Letters 20 (10–12): 1945–50. doi:10.1166/asl.2014.5640.

Wang, Jigang, Predrag Neskovic, and Leon N. Cooper. 2007. “Improving Nearest Neighbor Rule with a Simple Adaptive Distance Measure.” Pattern Recognition Letters 28 (2): 207–13. doi:10.1016/j.patrec.2006.07.002.

Weinberger, Kilian Q, and Lawrence K Saul. 2009. “Distance Metric Learning for Large Margin Nearest Neighbor Classification.” The Journal of Machine Learning Research 10: 207–44. doi:10.1126/science.277.5323.215.

Witten, Ian, Eibe Frank, and Mark Hall. 2011. Data Mining Practical Machine Learning Tools and Techniques Third Edition. Data Mining. Vol. 277. Elsevier Inc. doi:10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C.

Wu, Jia, Shirui Pan, Xingquan Zhu, Zhihua Cai, Peng Zhang, and Chengqi Zhang. 2015. “Self-Adaptive Attribute Weighting for Naive Bayes Classification.” Expert Systems with Applications 42 (3). Elsevier Ltd: 1487–1502. doi:10.1016/j.eswa.2014.09.019.

Yunlong, GAO; Yixiao LIU. 2016. “An Improved Feature-Weight Method Based on K-NN.” In Proceedings of the 35th Chinese Control Conference, 6950–56. Chengdu, July 27-29.

Zolghadri, Mansoor, Elham Parvinnia, and Robert John. 2009. “A Method of Learning Weighted Similarity Function to Improve the Performance of Nearest Neighbor.” Information Sciences 179 (17). Elsevier Inc.: 2964–73. doi:10.1016/j.ins.2009.04.012.




DOI: http://dx.doi.org/10.30646/sinus.v20i1.565

Refbacks

  • There are currently no refbacks.


 


STMIK Sinar Nusantara

KH Samanhudi 84 - 86 Street, Laweyan Surakarta, Central Java, Indonesia
Postal Code: 57142, Phone & Fax: +62 271 716 500 

Email: ejurnal @ sinus.ac.id | https://p3m.sinus.ac.id/jurnal/e-jurnal_SINUS/

ISSN: 1693-1173 (print) | 2548-4028 (online)


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

View My Stats