UTILIZING ROOTS AND PATTERNS TO IDENTIFY ARABIC NAMED ENTITIES

PDF

Published: 2022-11-07

DOI: 10.56557/ajomcor/2022/v29i27922

Page: 33-42


ABDULMONEM AHMED *

Department of Material Science and Engineering, Graduate School of Natural and Applied Kastamonu University, 37150 Kastamonu, Turkey.

AYBABA HANÇRLİOĞULLARI

Department of Physics, Faculty of Science and Letters, Kastamonu University, 37150 Kastamonu, Turkey.

ALİ RIZA TOSUN

Department of Philosophy, Faculty of Science and Letters, Kastamonu University, 37150 Kastamonu, Turkey.

*Author to whom correspondence should be addressed.


Abstract

Named Entity Recognition NER is a subset of information extraction that seeks to recognize and categorize named things in text data into specified categories, such as people's names, organizations' names, geographic locations, and so on. This task has recently attracted a lot of attention due to the discovery it has the potential to boost the performance of a variety of NLP applications. In the domains of Question Answering and Summarization Systems, Information Retrieval and Extraction, Machine Translation, Video Annotation, Semantic Web Search, and Bioinformatics, the majority of difficulties require named entity recognition. Arabic is an inflectional language, which allows for non-concatenative morphological operations on the root. The purpose of this study is to extract and recognize entity names from Arabic articles. We proposed an algorithm for determining names from roots using patterns. We developed it in Python and leveraged the "pyqt5" visual package to see the results immediately, as well as modify and add patterns easily. To replicate the names, we used a random sample of 400 names and 45 different patterns. The algorithm correctly identified 370 names easily and quickly, yielding a success rate of 93%. All names with the same recognized names will be known in the same way by the method and do not need any manipulation in code or design. The names that are not recognized by our algorithm have no roots in the list of known Arabic roots. Our research shows that the approach can recognize names with roots with high speed and accuracy, but it is not possible to identify nouns that are not in the Arabic language using this method. As a result, we recommend using a hybrid method that incorporates multiple concepts.

Keywords: Named entity recognition, root, pattern, Arabic, information retrieval


How to Cite

AHMED, A., HANÇRLİOĞULLARI, A., & TOSUN, A. R. (2022). UTILIZING ROOTS AND PATTERNS TO IDENTIFY ARABIC NAMED ENTITIES. Asian Journal of Mathematics and Computer Research, 29(2), 33–42. https://doi.org/10.56557/ajomcor/2022/v29i27922

Downloads

Download data is not yet available.