"Adding a Killer Feature to Snatch the Market? Is Ali's Smart Speaker Moving Too Early?
(Original Title: Add a Killer Function to Seize the Market? Ali’s Speaker Seems Afraid to Move Too Soon)
[Image: https://i.bosscdn.com/blog/ee/5d/6e/295acc4864bd324e6ca5d0c7a820170717145207.jpeg]
As expected, Alibaba has finally launched its smart speaker, the Tmall Genie X1. This move came as both a surprise and a foregone conclusion, intensifying the competition in the smart speaker arena.
Echo has already sold nearly 20 million units cumulatively. Google, Microsoft, and Apple quickly followed suit, and domestic players in China are now scrambling to join the fray.
Last week, Alibaba officially announced the Tmall Genie X1, a significant step in its smart home strategy. This development has made the race for market dominance more intriguing.
In fact, the day before the release of the $499 Lynx Genie, Lei Feng Wang had written an article titled, "Why China's Echo isn't Here Yet, and Tomorrow's AI's New Product Could Bring Surprises."
So, what surprises does Alibaba’s smart speaker offer compared to other competitors?
Bright 'Surprises'
Before the official release, media reports suggested that Alibaba had invested billions of dollars in the Pepper robot project, redirecting staff to the Artificial Intelligence Lab. Despite this investment, the functionalities offered by the Tmall Genie X1 appear quite similar to those of Amazon's Echo—functions like playing music, ordering takeout, checking the weather, setting alarms, and controlling smart home appliances.
According to the promotional materials, one standout feature of the Tmall Genie is voiceprint recognition, something Echo lacks. Alibaba claims that this feature allows the speaker to differentiate between household members and deliver personalized content based on each person's preferences. Currently, it can recognize up to six individuals. Users can also verify purchases and complete transactions through voice commands.
This seems like a pretty cool feature, but why hasn't Amazon implemented it in Echo yet?
It turns out that Amazon has long considered implementing this technology. However, according to Amazon employees, the feedback from hardware and software companies in the voiceprint recognition field suggests that identifying different users' voices is much harder than anticipated.
"As the equipment needed to remove noise, echoes, and reverberations makes it challenging to accurately identify a person's voice," said Vineet Ganju, vice president of Conexant's voice division.
So, does the Tmall Genie truly support this key feature?
To answer that, let’s look at the challenges involved in voiceprint recognition.
Firstly, from the perspective of the voiceprint recognition algorithm, Dr. Chen Xiaoliang, founder of Shengzhi Technology, explained in an interview with Lei Feng Network that voiceprint recognition remains a relatively niche field with limited applications. Most current research focuses on dynamic real-time detection. These dynamic methods naturally incorporate various principles of static detection methods, along with additional algorithms like noise reduction and de-reverberation.
Dr. Chen noted that voiceprint recognition still faces issues rooted in data-driven pattern recognition, which includes unresolved physical and computational problems. While the uniqueness of voiceprint recognition is promising, existing equipment and technology still struggle to make precise distinctions. Factors such as a person's varying voice due to physical conditions, age, or emotions, as well as environmental noise, make voiceprint features difficult to extract and model.
Dr. Chen Xiaoliang believes that deep learning has significantly improved pattern recognition, with open-source algorithms available. However, progress in voiceprint recognition remains slow, constrained by the acquisition of voiceprints and the establishment of features.
Dr. Chen Dongpeng, a senior scientist at voiceprint recognition provider SpeakIn, echoed this sentiment. He noted that voiceprint recognition is susceptible to various real-world influences, including noise, multiple speakers, physical conditions, and emotional states. Despite efforts by companies like SpeakIn to optimize these issues through software and hardware algorithms, challenges remain.
Li Haibo, vice president of Himalaya, shared his thoughts on the application of voiceprint recognition. He stated that while the company has worked on this issue for a long time, it cannot achieve complete accuracy. Currently, it remains in an experimental stage with limited effectiveness.
When discussing the Tmall Genie, Li Haibo pointed out that far-field speech recognition typically works effectively within three to five meters, with noise reduction around 70dB. Ambient noise and acoustics make it harder to wake up the device under these conditions. Far-field voiceprint recognition is even less stable under the same distance. Common scenarios for smart speakers include the living room, TV area, kitchen, and bedside. Except for the bedside, the distances in the other scenarios are generally over three meters. Thus, the practicality of voiceprint recognition in the Tmall Genie remains uncertain.
As for why Amazon hasn’t implemented this feature in Echo, Li Haibo believes the technology is not yet mature, despite its potential appeal.
Todd Mozer, CEO of Sensory, agreed, stating that identifying who is talking to far-field voice devices like Echo is challenging. As the signal-to-noise ratio increases, device performance suffers.
"The process of noise reduction and separating speech from noise significantly impacts user identification. So far, there is no product on the market that simultaneously handles user identification, far-field speech, and noise processing," Mozer explained.
From a practical application perspective, Dr. Liu Bin, a senior expert in intelligent voice algorithms at the Institute of Automation, Chinese Academy of Sciences, shared his insights. He noted that far-field speech recognition is disrupted by noise, echoes, and reverberation, making both speech and voiceprint recognition challenging.
Currently, reliable far-field speech recognition can cover about 3-5 meters; voiceprint recognition is even more challenging. Speech recognition aims to understand the content of the speech signal, which is closely related to resonance peaks concentrated in the low-frequency band. Low-frequency bands have low energy and are easily affected by external interference. High-frequency bands, where speech energy is relatively lower, are more susceptible to various disturbances, making far-field voiceprint identification particularly challenging.
Dr. Liu added that people’s speaking characteristics change with different factors, such as when they’re sick. Thus, near-field voiceprint recognition is still not fully mature, and far-field conditions make it even more difficult.
Overall, Alibaba's integration of voiceprint recognition in smart speakers is a strategic move. Offering unique features not found in Echo or JD.com, it enhances competitiveness in a market flooded with homogenized products.
However, given the current immaturity of both technology and market, this move may be premature.
[Image: https://i.bosscdn.com/blog/0e/20/52/6ba9b68156e5d9333ddeb79c80.jpeg]
So, why is Alibaba incorporating far-field voiceprint recognition, a less mature technology, into its smart speaker?
Aside from using this technology to differentiate itself and capture market share, Dr. Liu also highlighted Alibaba’s strengths in e-commerce. Applying e-commerce identity authentication is a key direction for Alibaba.
Alibaba’s integration of voice-based shopping into the Tmall Genie, leveraging the vast resources of Taobao and Tmall, is logical. However, from Amazon’s previous attempts with Echo, user adoption of voice shopping has been low, with a poor user experience.
Hu Yu, CEO of HKUST, noted in an interview with Lei Feng that from a market perspective, shopping scenes are still underdeveloped in smart speakers. Real demand should align with users’ immediate needs. Although Echo is selling well, surveys reveal that users primarily utilize simple tools like setting reminders and checking the weather.
This explains why many companies emphasize the importance of voice interaction combined with visual presentation. Without sufficient visual information, completing complex tasks via voice alone is challenging. Hence, some features and scenes were designed by developers without fully understanding user needs.
In summary, if user habits for e-commerce functions haven’t developed and voiceprint technology remains problematic, adding voiceprint recognition to smart speakers risks failing the market test.
Despite this, Alibaba’s initiative is commendable. By introducing cutting-edge technology in a crowded market, it aims to differentiate itself. However, taking this step too early might prove risky.
[Image: https://i.bosscdn.com/blog/4b/2a/47/8eb3c44e27af6184cedc91969d20170412160138.jpeg]"
Stage Follow Lights ,Follow Spot Lighting,Follow Me Lighting,Lights Follow
Guangzhou Cheng Wen Photoelectric Technology Co., Ltd. , https://www.cwledwall.com