A Joint Rule-Based Framework for Vowel Restoration and Part-of-Speech Tagging in Low-Resource Sindhi Text
Volume 19, Issue 2, 2025
Download| Author(s): |
Saira baby Farooqui* Khairpur College of Agriculture and Management Sciences, Sindh Agriculture University, Tandojam, Pakistan, sairamb.farooqui@gmail.com Noor Ahmed Shaikh Shah Abdul Latif University, Khairpur, Pakistan, noor.shaikh@salu.edu.pk Aftab Ahmed Bhatti Shah Abdul Latif University, Khairpur, Pakistan, dearbhatti117@gmail.com Kazim Ali Sukkur IBA University, Sukkur, Pakistan, kazimali.bssef21@iba-suk.edu.pk Benish Zehra Shah Abdul Latif University, Khairpur, Pakistan, bzkakepoto@gmail.com Iqra Qamar Solangi Shah Abdul Latif University, Khairpur , Pakistan, iqrasolangi550@gmail.com |
|---|---|
| Abstract | Languages written in Perso-Arabic scripts commonly omit short vowels in written text, resulting in substantial lexical and grammatical ambiguity, particularly in low-resource settings. This paper addresses this challenge for Sindhi by proposing a joint rule-based framework for vowel restoration and part-of-speech (POS) tagging. The aim of the study is to improve grammatical analysis of undiacritized Sindhi text by explicitly modeling the linguistic interdependence between vowel realization and grammatical category. The proposed method integrates phonological, morphological, and syntactic rules within a unified inference mechanism that enables bidirectional interaction between vowel restoration and POS tagging. Unlike conventional pipeline approaches, the framework resolves both tasks simultaneously through mutual constraint satisfaction, reducing ambiguity and limiting error propagation. Experimental evaluation on manually annotated Sindhi text shows that joint processing consistently improves both vowel restoration and POS tagging compared to independent task execution. The findings highlight the importance of vowel information for grammatical disambiguation and demonstrate the effectiveness of linguistically informed joint modeling in low-resource contexts. The primary novelty of this work lies in introducing the first joint framework for vowel restoration and POS tagging in Sindhi. The proposed approach provides a robust foundation for downstream applications such as coreference resolution, information retrieval, and machine translation, and is transferable to other low-resource languages with similar orthographic characteristics. |
| Keywords | Low-resource NLP; Sindhi language processing; Vowel restoration; POS tagging; Rule-based methods; Joint inference. |
| Year | 2025 |
| Volume | 19 |
| Issue | 2 |
| Type | Research paper, manuscript, article |
| Journal Name | Journal of Information & Communication Technology | Publisher Name | ILMA University | Jel Classification | - | DOI | - | ISSN no (E, Electronic) | 2075-7239 | ISSN no (P, Print) | 2415-0169 | Country | Pakistan | City | Karachi | Institution Type | University | Journal Type | Open Access | Manuscript Processing | Blind Peer Reviewed | Format | Paper Link | https://jict.ilmauniversity.edu.pk/journal/jict/19.2/5.pdf | Page | 77-89 |