Smart stuff
Compile |Xu Shan
Edit |Yun Peng
The battle of large models has swept throughout 2023, and immediately after, the major tech giants seem to be aiming at AI wearables, especially smart glasses!
Zhidong reported on December 18 that, according to the information, technology giants such as Meta, Google, Microsoft, and OpenAI are preparing to apply AI large models to wearable devices with cameras such as smart glasses. They believe that hardware such as smart glasses will become a suitable carrier for AI large models, because multimodal AI large models can process multiple types of information such as sound and **.
Recently, major tech giants have been experimenting with incorporating AI capabilities into different mobile devices. According to people familiar with the matter, OpenAI is recently embedding the "GPT-4 with Vision" object recognition software into the products of social company Snap. This may provide new functionality for Snap's smart glasses, Spectacles.
Meta also showed last Tuesday the effectiveness of its own AI integration into Ray-Ban smart glasses. The smart glasses can describe what the user sees through an AI voice assistant, while being able to tell the user which shirt fits which pants, and has a series of new features such as translating Spanish newspapers into English.
There is also a team within the Amazon Alexa AI assistant team that is working on a new AI device with a sensory function. In addition, Google, like most mobile phone manufacturers, has begun to experiment with the use of AI features in mobile phones.
In June this year, Apple's Vision Pro headset was officially unveiled, and it is planned to be released next year**. However, according to The Information, the device may not initially have multimodal AI capabilities.
When a new mobile terminal revolution begins, how will technology giants such as Apple, Microsoft, OpenAI, and Meta lay out new battlefields? How do they highlight their AI advantages in major hardware? Which new AI hardware may become the best carrier for AI large models? Through the latest revelations, we can see that a war of AI hardware innovation is beginning.
In Gemini, a large AI model released just last week, shows how the AI guesses the name of the movie based on the actions of the imitator. It also shows details such as how to guess a map, how to deal with manual problems, and more.
Although the content may be edited, it also reveals the basic idea that Google wants to convey: to create an AI that is always the same, and that can give users direct feedback or help through what people are watching and listening to. According to a person with direct knowledge of Google's consumer hardware strategy, it may be years before Google delivers that experience, as implementing environment-based computing would be power-intensive.
Google Glass. Now, Google is redesigning the operating system of its Pixel phone, hoping to embed a smaller Gemini model and upgrade the experience for its mobile AI assistant, Pixie, for example, telling users that they can buy the product they just photographed.
Based on Google's long-term layout in search technology, The Information feels that based on the surrounding environment information, learning Xi and ** AI devices that people need or want seem to be a good fit for Google. Although Google Glass failed ten years ago, Google has also pushed Android phone manufacturers to scan the environment through the phone camera and push the images to Google, and then analyze them based on the cloud system, thus forming the "Google Lens" image search application.
People familiar with the strategy say the company recently canceled the development of glasses-style devices, but is still developing software for this type of device. Google plans to license its image search software to hardware makers, similar to how it develops an Android mobile operating system for mobile phone makers like Samsung, using its AI model, the people said.
With the boom of multimodal AI models, Microsoft researchers and product teams have also begun to try to upgrade their voice assistants and try to run AI functions on some small devices.
According to the patent application and people familiar with the matter, the model can support some affordable smart glasses or other hardware. Microsoft is planning to run AI software on its AR headset Hololens. The user points the headset front camera at the object, takes a picture** and sends it to the chatbot powered by OpenAI, which can directly identify the object. At the same time, users can also get more information from the chatbot through conversations.
hololens
Apple's Vision Pro has a lot of new multimodal features, but the progress of AI large models is slightly behind others. At the moment, there is no indication that the Vision Pro will have sophisticated object recognition or other multimodal AI capabilities at launch.
But Apple has spent years perfecting the Vision Pro's computer vision capabilities so that the device can quickly recognize its surroundings. This includes quickly identifying furniture and knowing whether the wearer is sitting in the living room, kitchen, or bedroom. Perhaps, Apple is developing a multimodal large model that can recognize images and **.
vision pro
But compared to the glasses being developed by other companies, the Vision Pro is large, heavy, and not suitable for use in everyday outdoor scenes.
On the other hand, Apple reportedly paused the development of its own AR glasses earlier this year to focus on the sale of its headsets. It is unclear when the development of AR glasses will resume.
Meta CTO Andrew Bosworth said in an Instagram post on Tuesday that some Ray-Ban glasses users will be able to access AI models directly on the smart glasses side.
ray-ban
Some of Meta's leaders see Ray-Ban Glasses as a "pioneer" in AR glasses. The device can blend digital images with the surrounding real world. According to the original plan, Meta planned to launch AR glasses in the next few years, but the plan encountered a series of difficulties. Specifically, it has been reported that smart glasses are difficult to attract users, and the development of next-generation displays has encountered difficulties.
But the arrival of the multimodal AI model seems to have reinvigorated Bosworth and his team to understand that the glasses could bring a range of new AI capabilities to customers in the short term.
This summer, during Amazon's biannual product planning, engineers from the Alexa team proposed a new device capable of running multimodal AI.
According to people with direct knowledge of the project, the team is particularly focused on reducing the need to process AI computing and memory such as images, ** and voice on the device. It's unclear whether the project is funded or what the device intends to solve for customers, but it's separate from the company's line of Echo voice assistant devices.
Previously, the Alexa team had also developed a smart audio glasses called Echo Frames. The device does not support on-screen displays or cameras. It's unclear if Amazon will develop smart glasses with visual recognition.
This isn't the first time Silicon Valley giants have designed this type of wearable device with a camera. Google, Microsoft, and other tech giants have previously developed AR headsets. They wanted to be able to have the digital screen appear on the translucent screen of the headset, providing step-by-step guidance to help the user complete the task. However, due to the complexity of the optical design, most of the products did not respond well.
The multimodal large language model launched by OpenAI can let AI know what people are looking at and what they are doing through visual recognition, and can provide further information about these behaviors and things. When large language models begin to be lightweight, some small devices can also be equipped with models, which can provide instant feedback to user requests. Considering the importance people place on privacy and security, it may take a while for people to accept smart glasses, as well as some AI devices with built-in cameras.
The Information believes that smart glasses with AI assistants may become as transformative products as smartphones. It can not only be a tutor to guide students in math problems or problems, but also provide environmental information to the surrounding people at any time, such as translating billboards, telling users how to solve car breakdowns, etc.
Pablo Mendes, a former engineering manager at Apple and CEO of AI search company Objective, said: "Big AI models are essential to everything, and they will play a role in the underlying architecture of computers, phones and other devices".
In the third round of artificial intelligence boom set off by ChatGPT, multimodal large models belong to the underlying infrastructure, and ChatGPT belongs to direct applications, which are clear answers. But on which devices can ChatGPT maximize its application potential, and which devices are the best carriers for large language models? These have become the directions that technology giants such as OpenAI, Microsoft, and Google are now beginning to explore.
Judging from the latest revelations from The Information, smart glasses with cameras have become an important direction for many giants to explore, and some companies have begun to explore the development of new wearable AI devices. Or, try to adapt various AI models on your mobile phone.
In fact, it's not just the tech giants who think this way. In China, many AR glasses manufacturers also believe that this is where the opportunity lies. "Robots and AR glasses may be the biggest beneficiaries of this wave of AI models. An industry person who has been paying attention to the AI industry for more than ten years said.
But under the same design idea, who can finally tune the best lightweight AI model? Who can create the most practical smart glasses? We'll continue to monitor the progress of the tech giants to find out.
*:the information