Architecture of Parts of speech Tagger in Sindhi Language

Volume 16  Issue 2    2022

Download

Author(s): Saira Baby Farooqui*, Noor Ahmed Shaikh, Samina Rajper
sairafarooqui@sau.edu.pk, noor.shaikh@salu.edu.pk, samina.rajper@sau.edu.pk
Abstract The Sindhi language is intricate and one of the oldest tongue spoken and written in several parts of the world. In this language words segmentation, the short vowel restoration (SV) and parts of Speech (POS) is tagging the generally challenging jobs for its natural language processing (NLP) applications. Furthermore, it’s complex by the features. For example soft spaces in lexis, short vowels are compound and complex words are found in Sindhi. For its complexity, Parts of Speech (POS) tagging are challenging job for machine learning. It can help to overcome these ambiguities. Sindhi has eight POS according to the formation of sentence. The (POS) change their nature that human beings easily understand but a computer does not do. To overcome these issues some rules are defined in this model which may help a machine to recognize POS tagging. In POS Architecture has three contrasting phases to resolve the tagging problems in this language and other languages as well. The tokenization is a utensil for NLP for word segmentation (Sentence are break into words, After segmentation of words SVR phase is start for proper vocalization and applied tags for tagging which can help to understand appropriate texts. POS are tagged in the corpus, it removes the uncertainty. The English language has various rules but the Sindhi language identify its functionality by the corpus (vowel, short vowels, space and white spaces). The architecture helps the society to recognize the indigenous structure of language. This architecture also helpful to translator of Sindhi Language, Question Answering, Information Extraction, Machine Text, Summarization, Translation, Sindhi Dictionaries, Information Retrieval and Web Portals.
Keywords Architecture of Parts of speech tagging, short vowel restoration (SVR), Natural language processing (NLP).
Year 2022
Volume 16
Issue 2
Type Research paper, manuscript, article
Journal Name Journal of Information & Communication Technology
Publisher Name ILMA University
Jel Classification -
DOI -
ISSN no (E, Electronic) 2075-7239
ISSN no (P, Print) 2415-0169
Country Pakistan
City Karachi
Institution Type University
Journal Type Open Access
Manuscript Processing Blind Peer Reviewed
Format PDF
Paper Link https://jict.ilmauniversity.edu.pk/journal/jict/16.2/2.pdf
Page 42-47