Automated Lip Reading to Predict Visemes using Multimodal Convolutional Neural Network with Audio-Visual Features
Volume 17 Issue 2 2023
DownloadAuthor(s): | Khalid Mahboob, Umm-e-Laila*, Sana Alam, Muhammad Abbas, Muhammad Asghar Khan, Sidra Fatima |
---|---|
Abstract | The process of interpreting sentences based on the movements of a speaker's lips is referred to as lip reading. Traditionally, this task has been approached in two stages using conventional methods: first, by generating or learning audio-visual features, and second, by making predictions. While contemporary deep lip reading techniques benefit from end-to-end trainable datasets, much of the existing research on these models tends to concentrate solely on word classification rather than predicting sequences at the sentence level. Long sentences may be lip-read by humans, as studies have shown. This study emphasizes the value of temporal considerations by highlighting the components that are important for capturing temporal context in instances when communication channels are unclear. In the paper, a lip-reading system for viseme prediction is shown. The system uses a Convolutional Neural Network (CNN) with a recurrent network, spatiotemporal convolutions, and the connectionist temporal classification loss. A variable-length series of video frames is efficiently mapped to text using an end-to-end training procedure. Both visual and auditory qualities are evaluated using the CNN architecture. The CNN model outperforms trained human lip readers and achieves accuracies of 72.8% CER and 80.8% WER (unseen speakers with audio), whereas 46.2% CER and 56.6% WER (unseen speakers without audio), which are reasonable accuracies on the GRID corpus by splitting test at the level of the sentences. |
Keywords | lip, reading, model, visemes, accuracy, convolutional neural network |
Year | 2023 |
Volume | 17 |
Issue | 2 |
Type | Research paper, manuscript, article |
Journal Name | Journal of Information & Communication Technology | Publisher Name | ILMA University | Jel Classification | - | DOI | - | ISSN no (E, Electronic) | 2075-7239 | ISSN no (P, Print) | 2415-0169 | Country | Pakistan | City | Karachi | Institution Type | University | Journal Type | Open Access | Manuscript Processing | Blind Peer Reviewed | Format | Paper Link | https://jict.ilmauniversity.edu.pk/journal/jict/17.2/1.pdf | Page |