Edge AI semiconductor company Ambarella showcased at CES a multimodal large language model (LLM) running on its new N1 SoC family that uses a fraction of the power of a GPU solution for inference.
Ambarella's goal is to bring generative artificial intelligence (GenAI) to edge endpoint devices and on-premise hardware for a wide range of applications, including security analytics, robotics, and numerous industrial applications.
Ambarella will initially offer optimized GenAI processing power on mid- to high-end SoCs, from the existing CV72 for sub-5W on-device performance, to the new N1 series for server-class performance of sub-50W. Compared to GPUs and other AI accelerators, Ambarella offers a complete SOC solution with up to 3x power efficiency per token generated, while enabling immediate and cost-effective deployment in the product.
Les Kohn, CTO and co-founder of Ambarella, said: "The GenAI network enables new capabilities that were previously impossible in our target application markets. All edge devices are going to get smarter, and our N1 Series SoCs enable world-class multimodal LLM processing at very attractive power consumption**. ”
Alexander Harrowell, Principal Analyst for Advanced Computing at Omdia, said: "Over the next 18 months, almost all edge applications will be enhanced with GenAI. "When genai workloads are moved to the edge, the game becomes performance-per-watt and integration with the rest of the edge ecosystem, not just raw throughput. ”
All of Ambarella's AI SoCs are powered by the company's new Cooper developer platform. In addition, to reduce the time to market for customers, Ambarella has pre-ported and optimized popular LLMs such as LLAMA-2, as well as large language and assistant (ll**a) models running on N1 for multimodal visual analysis of up to 32 camera sources. These pre-trained and fine-tuned models will be available to partners from the Cooper Model Library**.
For many real-world applications, visual input is a key modality in addition to language, and Ambarella's SoC architecture itself is well-suited to process both ** and AI at very low power consumption. Unlike stand-alone AI accelerators, providing a full-featured SOC can efficiently handle multimodal LLMs while still having the ability to perform all system functions.
GenAI will be a functional step in computer vision processing, bringing context and scene understanding to a wide range of devices, from safety devices and autonomous robots to industrial applications. Examples of on-device LLM and multimodal processing provided by Ambarella include: Intelligent Context Search for Security**Robots that can be controlled with natural language commands;As well as different AI assistants that can perform anything from generation to text and image generation.
Most of these systems rely heavily on cameras and natural language understanding, and will benefit from on-device generated AI processing to improve speed and privacy, as well as lower total cost of ownership. The local processing supported by the Ambarella solution is also well-suited for application-specific LLMs, which are typically fine-tuned at the edge of each individual scene;The traditional server approach is to use larger, more power-hungry LLMs to meet each use case.
Based on Ambarella's powerful CV3-HD architecture, originally developed for autonomous driving applications, the N1 Series SoC repurports all of this performance to run multimodal LLMs with very low power consumption. For example, the N1 SoC runs LLAMA2-13B in single-stream mode with less than 50W, outputting up to 25 tokens per second. Combined with an easy-to-integrate, pre-ported model, this new solution can quickly help OEMs deploy generative AI into any power-sensitive application, from on-premise AI boxes to delivery robots.
This week, a demonstration of the N1 SOC and its multimodal LLM capabilities will be on display at the Ambarella booth during CES.