Language the next frontier of artificial intelligence
Editor’s note: News anchors now seem to be the next target on the ‘threat list’ of intelligent machines. China’s fourth-largest internet company Sogou has launched virtual news anchors through artificial intelligence, relying on lip movement recognition and translation among a host of functions.
In the near future, news anchors face the prospect of being replaced by virtual anchors while you won’t realize it at all, through artificial intelligence, including the process of lip motion synthesis, speech synthesis, joint modeling of audio and video, and in-depth learning.
The virtual anchor was launched by Sogou — China’s fourth-largest internet company by users after Baidu, Alibaba and Tencent. Wang Xiaochuan, chief executive officer of Sogou, showed how the world’s first virtual anchor works during the RISE technology conference in Hong Kong earlier this month.
After training and more than an hour calculating the video and audio materials from the news anchor by the machine, a piece of synthetical news video was played during Wang’s speech at the conference, entitled “The next frontier of artificial intelligence”.
“Lip-reading recognition involved in the virtual anchor can be widely applied in other scenarios too,” said Wang. The machine could tell one’s words by recognizing lip movements without recording the voice.
Wang gave another demonstration of the combination of personalized speech synthesis and emotional transference. With 14 minutes’ training data of Wang’s voice and the music composition of a recent popular song My Skateboard Shoes — a synthetic version of the same song — Wang’s edition was produced, including the original melody and his tones and language style.
“Language is the future of AI,” Wang said.
Beijing-based Sogou, which went public on the New York Stock Exchange in November last year, is China’s second-largest search engine after Baidu. It also operates Sogou Input Method and Sogou browser. “As a search company, we’re good at AI because we are clear about application scenarios and the input and output of information,” said Wang.
According to Sogou, its mobile keyboard processed an average of 400 million voice requests per day. “We’ve already achieved 98 percent of the Chinese speech recognition rate. Meanwhile, as China’s largest voice input engine, Sogou Input Method helps us collect a huge amount of corpus and user behaviors,” Wang said.
Wang explained in his speech there are two aspects of language in AI — one is natural interaction which allows free communication between people and machines through images and voice, and the other is knowledge calculation, including conversations, questions and answers and translations.
He believes that through natural interaction and knowledge calculation, communication without language boundaries could be achieved while challenges exist as well.
“How intelligent the machine could be is what we are thinking about,” Wang said. Assistant dialogues, which allow machines to generate response according to different people, are being developed by Sogou.