On the evening of December 9, the NCMMSC-CNVSRC 2023 academic seminar was held in Suzhou, the 18th National Conference on Human-Computer Speech Communication (NCMMSC 2023), at which the winners of the visual speech recognition competition CNVSRC 2023 were announced, and Tsinghua University and the winning teams shared their wonderful views on the spot.
The competition was initiated by the NCMMSC 2023 Organizing Committee and co-sponsored by Tsinghua University, Beijing University of Posts and Telecommunications, Haitian AAC, and Voice Home. The core purpose of the competition is to verify the performance of current visual speech recognition (or lip recognition) technology in the case of continuous recognition of large vocabulary.
The competition attracted a total of 85 teams from home and abroad. After nearly three months of competition, the participating teams from ASLP-Li Auto of Western University of Technology, Inner Mongolia University, Red Watermelon Semiconductor, Chengzhi Technology, Beijing University of Posts and Telecommunications, Flush and other units have achieved good results. Detailed results and reports** will be published on the official website of the competition, so stay tuned
T1 Single-Speaker vsr - Fixed Track Track
1 T237 XI'AN UNIVERSITY ASLP-LI AUTO NPU-ASLP-LIAUTO
2 T266 Red Watermelon Semiconductor gua speech
3 T290 CZUR
4 T238 Beijing University of Posts and Telecommunicationsvii
5 T267 Straight Flush Voice Group RoyalFlush
T1 Single-Speaker vsr - Open Track
1 T237 XI'AN UNIVERSITY ASLP-LI AUTO NPU-ASLP-LIAUTO
T2 Multi-Speaker VSR - Fixed Track Track
1 T244 Inner Mongolia University Daydayup
2 T267 Straight Flush Voice Group RoyalFlush
T2 Multi-Speaker VSR - Open Track
1 T237 XI'AN UNIVERSITY ASLP-LI AUTO NPU-ASLP-LIAUTO
2 T244 Inner Mongolia University Daydayup
During the seminar, Mr. Wang Dong of Tsinghua University presided over the technical exchange meeting, Li Ke, deputy general manager and COO of Haitian AAC, delivered an opening speech and jointly presented awards to the winning teams of the two tracks with Bu Hui, founder & CEO of Voice House, and Chen Chen, a student from Tsinghua University, shared the baseline system and technical report.
Professor Wang Dong of Tsinghua University presided over the technical exchange meeting.
Li Ke, deputy general manager and COO of Haitian AAC, delivered an opening speech and presided over the award.
Bu Hui, founder & CEO of Voice House, presided over the award.
Chen Chen, a student from Tsinghua University, shared the baseline system and technical report.
Representatives of the ASLP-Li Auto team of Xi'an University of Technology, Inner Mongolia University, Red Watermelon Semiconductor, and Beijing University of Posts and Telecommunications received the award.
The representative of the ASLP-Li Auto team of Xi'an University of Technology shared.
Representatives of the Red Watermelon Semiconductor team shared.
Representatives of the team from Beijing University of Posts and Telecommunications shared.
Representatives of the Straight Flush team shared online.
Group photo of the participants.
CNVSRC 2023 Organizing Committee members and other staff.
Visual speech recognition, also known as lip recognition, is a technology that uses lip movements to infer the content of a sound. This technology has important applications in the fields of public safety, helping the elderly and the disabled, and authenticity. At present, the research on lip recognition is in the ascendant, and although great progress has been made in the recognition of independent words and phrases, there are still great challenges in the continuous recognition of large word lists. Especially for Chinese, research progress in this field has been limited due to the lack of corresponding data resources. To this end, Tsinghua University released the CN-CVS dataset [1] in 2023, becoming the first large-scale Chinese visual speech recognition database, which provides the possibility to further promote large vocabulary continuous visual speech recognition (LVCVSR). For more information about the CN-CVS dataset, please visit the official website of the database
The reading data of the cnvsrc-multi dataset in this competition is the [Chinese Mandarin Pronunciation ** Recognition Database (Mobile Phone)] dataset donated by Haitian AAC to Tsinghua University. Haitian AAC donated datasets to Tsinghua University to promote the development of science.
In this competition, many teams have achieved significant improvements in system performance in lip recognition tasks, with the best results achieving a relative performance improvement of more than 20% compared to the baseline system. The contestants put forward refreshing and innovative solutions in each component of the lip recognition system, which provided new ideas and methods for further promoting the research of continuous visual speech recognition of Chinese large word lists.
1] c. chen, d. wang, t.f. zheng, cn-cvs: a mandarin audio-visual dataset for large vocabulary continuous visual to speech synthesis, icassp, 2023.