Automated Lip Reading to Predict Visemes using Multimodal Convolutional Neural Network with Audio-Visual Features

Volume 17  Issue 2    2023

Download

Author(s): Khalid Mahboob, Umm-e-Laila*, Sana Alam, Muhammad Abbas, Muhammad Asghar Khan, Sidra Fatima
Abstract The process of interpreting sentences based on the movements of a speaker's lips is referred to as lip reading. Traditionally, this task has been approached in two stages using conventional methods: first, by generating or learning audio-visual features, and second, by making predictions. While contemporary deep lip reading techniques benefit from end-to-end trainable datasets, much of the existing research on these models tends to concentrate solely on word classification rather than predicting sequences at the sentence level. Long sentences may be lip-read by humans, as studies have shown. This study emphasizes the value of temporal considerations by highlighting the components that are important for capturing temporal context in instances when communication channels are unclear. In the paper, a lip-reading system for viseme prediction is shown. The system uses a Convolutional Neural Network (CNN) with a recurrent network, spatiotemporal convolutions, and the connectionist temporal classification loss. A variable-length series of video frames is efficiently mapped to text using an end-to-end training procedure. Both visual and auditory qualities are evaluated using the CNN architecture. The CNN model outperforms trained human lip readers and achieves accuracies of 72.8% CER and 80.8% WER (unseen speakers with audio), whereas 46.2% CER and 56.6% WER (unseen speakers without audio), which are reasonable accuracies on the GRID corpus by splitting test at the level of the sentences.
Keywords lip, reading, model, visemes, accuracy, convolutional neural network
Year 2023
Volume 17
Issue 2
Type Research paper, manuscript, article
Journal Name Journal of Information & Communication Technology
Publisher Name ILMA University
Jel Classification -
DOI -
ISSN no (E, Electronic) 2075-7239
ISSN no (P, Print) 2415-0169
Country Pakistan
City Karachi
Institution Type University
Journal Type Open Access
Manuscript Processing Blind Peer Reviewed
Format PDF
Paper Link https://jict.ilmauniversity.edu.pk/journal/jict/17.2/1.pdf
Page