There are languages, contents, rhymes, timbres, emotions, ......On January 30, iFLYTEK held the Spark Cognitive Model V35. The upgrade conference showed the new development of the underlying capabilities of the large model to the application scenarios, and also released the "Spark Voice Large Model", bringing a new change in human-machine dialogue in the era of the Internet of Everything.
From the beginning of its establishment, iFLYTEK's dream and mission is to realize barrier-free human-machine information communication, and its original intention has been galloping all the way on the track of "intelligent voice" for 25 years, and it has continued to be at the forefront of the world. Over the years, iFLYTEK has always maintained source technology innovation iterations in speech recognition, speech synthesis and other fields, and has won a large number of international authoritative championships. For example, in terms of speech recognition technology, iFLYTEK has won the championship of the International Multi-channel Speech Separation and Recognition Competition CHIME for many consecutive years, and in terms of speech synthesis technology, iFLYTEK has won the championship of the International Speech Synthesis Competition Blizzard Challenge for 14 consecutive years.
In the view of Liu Qingfeng, chairman of iFLYTEK, the current general cognitive model has brought new opportunities for the development of intelligent speech technology, so that speech recognition can further break through the major problems of traditional "cocktail parties" such as high noise, far-field, and multi-person speech. "To put it simply, with the help of large models, we make a speech have richer attributes, including language, content, prosody, timbre, and emotion. Liu Qingfeng explained.
According to the disclosure at the press conference, the first batch of 37 mainstream languages of the Xinghuo voice model have surpassed the Whisper V3 launched by OpenAI; In terms of multilingual speech synthesis, the average MOS score of the first batch of 40 languages of the Xinghuo speech model has definitely increased by 025, the degree of anthropomorphism exceeds 83%, and it maintains the international leading level in intelligent voice technology.
The release of the Xinghuo voice model once again demonstrates the top technical strength of iFLYTEK in intelligent voice, and the large model has brought new opportunities for the development of voice technology.
At present, the Xinghuo voice model has been fully opened to developers, and it has been installed on the iFLYTEK translator for the first time, so that the translator can turn from a plain text to a text translation tool into a practical tool that brings rich help.
It not only supports more than 80 languages, but also has two new important functions: multilingual automatic recognition and enhanced translation, which greatly extend our translation scenarios, whether it is tourist attractions, food, or various cultural and art exhibition halls. Combined with the on-site demonstration of Liu Cong, Dean of iFLYTEK Research Institute, Liu Qingfeng introduced that multilingual automatic recognition can support 35 languages, improving the quality and efficiency of cross-language communication; Augmented Translation provides bilingual services in Chinese and English, allowing the translator to become an AI translation assistant, making cross-language communication more worry-free. The two important functions of multilingual automatic recognition and enhanced translation will be upgraded by the end of January and mid-March this year, respectively.
In addition to helping international communication, the Xinghuo voice model can also be "versatile" in more scenarios and empower practical applications. Liu Qingfeng introduced that in scenarios such as automobiles, customer service, families, and companion robots, the Xinghuo voice model has more places to play, bringing human-computer interaction changes. For example, empowering automobiles, the interactive experience of intelligent cockpit, intelligent cockpit, intelligent navigation, and first-class control will be further optimized; Industries such as companion robots, shopping guide robots, auxiliary diagnosis robots, smart homes, and wearable devices will also be further detonated with the empowerment of voice models.
At the press conference, Liu Qingfeng used a demonstration of AI customer service to vividly demonstrate the super-anthropomorphic dialogue and deeper understanding of the Xinghuo voice model, which can greatly improve the ability of background customer service. "I believe that in the era of the Internet of Everything, driven by new technologies, the new voice model will empower the entire industry and greatly promote our industrial upgrading. Liu Qingfeng said.