Architecture of Parts of speech Tagger in Sindhi Language
Volume 16 Issue 2 2022
DownloadAuthor(s): | Saira Baby Farooqui*, Noor Ahmed Shaikh, Samina Rajper sairafarooqui@sau.edu.pk, noor.shaikh@salu.edu.pk, samina.rajper@sau.edu.pk |
---|---|
Abstract | The Sindhi language is intricate and one of the oldest tongue spoken and written in several parts of the world. In this language words segmentation, the short vowel restoration (SV) and parts of Speech (POS) is tagging the generally challenging jobs for its natural language processing (NLP) applications. Furthermore, it’s complex by the features. For example soft spaces in lexis, short vowels are compound and complex words are found in Sindhi. For its complexity, Parts of Speech (POS) tagging are challenging job for machine learning. It can help to overcome these ambiguities. Sindhi has eight POS according to the formation of sentence. The (POS) change their nature that human beings easily understand but a computer does not do. To overcome these issues some rules are defined in this model which may help a machine to recognize POS tagging. In POS Architecture has three contrasting phases to resolve the tagging problems in this language and other languages as well. The tokenization is a utensil for NLP for word segmentation (Sentence are break into words, After segmentation of words SVR phase is start for proper vocalization and applied tags for tagging which can help to understand appropriate texts. POS are tagged in the corpus, it removes the uncertainty. The English language has various rules but the Sindhi language identify its functionality by the corpus (vowel, short vowels, space and white spaces). The architecture helps the society to recognize the indigenous structure of language. This architecture also helpful to translator of Sindhi Language, Question Answering, Information Extraction, Machine Text, Summarization, Translation, Sindhi Dictionaries, Information Retrieval and Web Portals. |
Keywords | Architecture of Parts of speech tagging, short vowel restoration (SVR), Natural language processing (NLP). |
Year | 2022 |
Volume | 16 |
Issue | 2 |
Type | Research paper, manuscript, article |
Journal Name | Journal of Information & Communication Technology | Publisher Name | ILMA University | Jel Classification | - | DOI | - | ISSN no (E, Electronic) | 2075-7239 | ISSN no (P, Print) | 2415-0169 | Country | Pakistan | City | Karachi | Institution Type | University | Journal Type | Open Access | Manuscript Processing | Blind Peer Reviewed | Format | Paper Link | https://jict.ilmauniversity.edu.pk/journal/jict/16.2/2.pdf | Page | 42-47 |