Enhanced Model Compression for Lipreading Recognition based on Knowledge Distillation Algorithm

Qian  Hu; Kuryati Kipli; Tengku Mohd Afendi  Zulcaffle; Yuan  Liu; Xiangju  Liu; Bo  Wang

Authors

Qian Hu Department of Electrical and Electronics Engineering, Faculty of Engineering, University Malaysia Sarawak (UNIMAS), Kota Samarahan, Sarawak, Malaysia
Kuryati Kipli Department of Electrical and Electronics Engineering, Faculty of Engineering, University Malaysia Sarawak (UNIMAS), Kota Samarahan, Sarawak, Malaysia
Tengku Mohd Afendi Zulcaffle Department of Electrical and Electronics Engineering, Faculty of Engineering, University Malaysia Sarawak (UNIMAS), Kota Samarahan, Sarawak, Malaysia
Yuan Liu Institute of Computer and Information Engineering Qilu Institute of Technology Jinan, Shandong, China
Xiangju Liu Institute of Computer and Information Engineering Qilu Institute of Technology Jinan, Shandong, China
Bo Wang Institute of Computer and Information Engineering Qilu Institute of Technology Jinan, Shandong, China

Keywords:

Lipreading, Deep learning, Model compression, Knowledge distillation

Abstract

Lipreading is understanding what a speaker is saying by observing changes in the speaker's mouth. The lipreading recognition model LipPC-Net proposed in this paper is built with a large Chinese lipreading data set based on Chinese phonetic rules and grammatical features and consists of two main parts: the P2P sub-model and the P2C sub-model. The P2P sub-model is a model for identifying pinyin sequences from pictures, while the P2C sub-model is a model for identifying Chinese character sequences from pinyin. However, Chinese language features are rich and fuzzy, and the training optimization of lip-reading model requires high GPU computation and storage, so it is difficult to realize large-scale application. Therefore, three knowledge distillation compression algorithms are proposed in this paper: Three different knowledge distillation compression algorithms, an offline model compression algorithm based on multi-feature transfer (MTOF), an online model compression algorithm based on adversarial learning (ALON), and an online model compression algorithm based on consistent regularization(CRON) to complete the compression of the Chinese character sequence output by the model. Three compression algorithms are used to fit and learn the transformation between different features, so that portable mobile terminals with limited hardware resources can carry the model. Thus, it can realize the practical application value of assisting the communication of deaf-mutes.