Analysis of Name Entities in Text Using Robust Disambiguation Method

Muthia Virliani, Moch. Arif Bijaksana, Arie Ardiyanti Suryani

Abstract


Named entities are proper nouns or objects contained in a text, such as a person's name, country name, and others. Names of persons in some text are often ambiguous, which makes it difficult for ordinary people to find out these same names are the same person or not.  An ambiguity of names also found in hadith, like the name Abdullah in hadith number 86 and 2411, that might be the same person or might be different. Based on this problem, then this study focuses on named entity disambiguation, which considered further semantic and lexical relation between a named entity. Expected in the future, it would help people to understand the ambiguity of the name or distinguish ambiguous names. The method used in this research was Robust Disambiguation because, in this method, the context of the named entity considered. The resulted output obtained was in the form of named entity that grouped based on the same person or different person processed with Density-based Spatial Clustering of Applications with Noise.  This research resulted in an accuracy value of 90%, a precision value of 97%, and a recall value of 89% obtained from actual value and predicted value

Keywords


Density-based Clustering; Disambiguation; Hadith Sahih Bukhari; Jaccard Similarity; Robust Disambiguation

Full Text:

PDF

References


K. Pendidikan, D. A. N. Seni, and B. Islam, “Ulumul hadits,” Ulumul Hadist, 2017.

X. Han and J. Zhao, “Structural Semantic Relatedness: A knowledge-based method to named entity disambiguation,” in ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 2010.

M. Dredze, P. Mcnamee, D. Rao, A. Gerber, and T. Finin, “Entity disambiguation for knowledge base population,” in Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 2010.

M. Pershina, Y. He, and R. Grishman, “Personalized page rank for named entity disambiguation,” in NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 2015, doi: 10.3115/v1/n15-1026.

E. F. Y. Hom, F. Marchis, T. K. Lee, S. Haase, D. A. Agard, and J. W. Sedat, “AIDA: an adaptive image deconvolution algorithm with application to multi-frame and three-dimensional data,” Journal of the Optical Society of America A, 2007, doi: 10.1364/josaa.24.001580.

A. Alhelbawy and R. Gaizauskas, “Graph ranking for collective Named Entity Disambiguation,” in 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, 2014, vol. 2, pp. 75–80, doi: 10.3115/v1/p14-2013.

J. Hoffart et al., “Robust disambiguation of named entities in text,” in EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2011.

Mr. Suryadi, “Rekonstruksi Kritik Sanad Dan Matan Dalam Studi Hadis,” ESENSIA: Jurnal Ilmu-Ilmu Ushuluddin, 2015, doi: 10.14421/esensia.v16i2.996.

S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of jaccard coefficient for keywords similarity,” in Lecture Notes in Engineering and Computer Science, 2013.

H. H. Batubara, “Pemanfaatan Ensiklopedi Hadis Kitab 9 Imam sebagai Media dan Sumber Belajar Hadis,” Muallimuna: Jurnal Madrasah Ibtidaiyah, 2017, doi: 10.31602/muallimuna.v2i2.769.

T. N. Tran, K. Drab, and M. Daszykowski, “Revised DBSCAN algorithm to cluster data with dense adjacent clusters,” Chemometrics and Intelligent Laboratory Systems, 2013, doi: 10.1016/j.chemolab.2012.11.006.

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN,” ACM Transactions on Database Systems, 2017, doi: 10.1145/3068335.

H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics. 2010, doi: 10.1002/wics.101.

N. Japkowicz, “Why question machine learning evaluation methods? (An illustrative review of the shortcomings of current methods),” in AAAI Workshop - Technical Report, 2006.

S. Visa, B. Ramsay, A. Ralescu, and E. van der Knaap, “Confusion matrix-based feature selection,” in CEUR Workshop Proceedings, 2011.

M. R. Ghorab, D. Zhou, A. O’Connor, and V. Wade, “Personalised Information Retrieval: Survey and classification,” User Modelling and User-Adapted Interaction, 2013, doi: 10.1007/s11257-012-9124-1.




DOI: http://dx.doi.org/10.30700/jst.v10i2.963

Article Metrics

Abstract view : 201 times
PDF - 170 times

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 SISFOTENIKA

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Badan Pengelola Jurnal Ilmiah Sistem Informasi dan Teknik Informatika (SISFOTENIKA) STMIK Pontianak.

 

Jurnal Ilmiah SISFOTENIKA terindex di :


   

   

  

    

    

    

   

 

 

 

ISSN Printed : 2087-7897

ISSN Online : 2460-5344


SERTIFIKAT PENGHARGAAN :

Jurnal Ilmiah SISFOTENIKA Terakreditasi Peringkat Empat

 

Partners & Co-Organizers:




Lisensi Creative Commons

Jurnal Ilmiah SISFOTENIKA: STMIK Pontianak Online Journal ISSN Printed (2087-7897) - ISSN Online (2460-5344) licensed under a Lisensi Creative Commons Atribusi 4.0 Internasional. Flag Counter

View My Stats>