A Joint Rule-Based Framework for Vowel Restoration and Part-of-Speech Tagging in Low-Resource Sindhi Text

Volume 19, Issue 2,  2025

Download

Author(s):

Saira baby Farooqui* Khairpur College of Agriculture and Management Sciences, Sindh Agriculture University, Tandojam, Pakistan, sairamb.farooqui@gmail.com

Noor Ahmed Shaikh Shah Abdul Latif University, Khairpur, Pakistan, noor.shaikh@salu.edu.pk

Aftab Ahmed Bhatti Shah Abdul Latif University, Khairpur, Pakistan, dearbhatti117@gmail.com

Kazim Ali Sukkur IBA University, Sukkur, Pakistan, kazimali.bssef21@iba-suk.edu.pk

Benish Zehra Shah Abdul Latif University, Khairpur, Pakistan, bzkakepoto@gmail.com

Iqra Qamar Solangi Shah Abdul Latif University, Khairpur , Pakistan, iqrasolangi550@gmail.com

Abstract Languages written in Perso-Arabic scripts commonly omit short vowels in written text, resulting in substantial lexical and grammatical ambiguity, particularly in low-resource settings. This paper addresses this challenge for Sindhi by proposing a joint rule-based framework for vowel restoration and part-of-speech (POS) tagging. The aim of the study is to improve grammatical analysis of undiacritized Sindhi text by explicitly modeling the linguistic interdependence between vowel realization and grammatical category. The proposed method integrates phonological, morphological, and syntactic rules within a unified inference mechanism that enables bidirectional interaction between vowel restoration and POS tagging. Unlike conventional pipeline approaches, the framework resolves both tasks simultaneously through mutual constraint satisfaction, reducing ambiguity and limiting error propagation. Experimental evaluation on manually annotated Sindhi text shows that joint processing consistently improves both vowel restoration and POS tagging compared to independent task execution. The findings highlight the importance of vowel information for grammatical disambiguation and demonstrate the effectiveness of linguistically informed joint modeling in low-resource contexts. The primary novelty of this work lies in introducing the first joint framework for vowel restoration and POS tagging in Sindhi. The proposed approach provides a robust foundation for downstream applications such as coreference resolution, information retrieval, and machine translation, and is transferable to other low-resource languages with similar orthographic characteristics.
Keywords Low-resource NLP; Sindhi language processing; Vowel restoration; POS tagging; Rule-based methods; Joint inference.
Year 2025
Volume 19
Issue 2
Type Research paper, manuscript, article
Journal Name Journal of Information & Communication Technology
Publisher Name ILMA University
Jel Classification -
DOI -
ISSN no (E, Electronic) 2075-7239
ISSN no (P, Print) 2415-0169
Country Pakistan
City Karachi
Institution Type University
Journal Type Open Access
Manuscript Processing Blind Peer Reviewed
Format PDF
Paper Link https://jict.ilmauniversity.edu.pk/journal/jict/19.2/5.pdf
Page 77-89