The hottest concept in the technology circle in 2023 is undoubtedly the AI model, not only Ali and Tencent in China, but also Microsoft, Meta, Google, and Amazon overseas, and even mobile phone manufacturers that do hardware now seem to be trying to tell a story of AI empowerment. Previously, vivo's blue heart model has been running on the S18 and X100 series, and Samsung's Gauss model is also about to appear on the Galaxy S24 series.
Just when friends in the Android camp began to engage in end-side models, Apple was naturally not far behind. A few days ago, Apple's artificial intelligence-related researchers said that they have made a key breakthrough in deploying large models to iPhones and other Apple devices with limited memory, and they have invented an innovative flash memory utilization technology that can be used to store data from large models to deal with memory limitations.
In an article titled "LLM in a Flash: Efficient Large Language Model Inference with Limited Memory", Apple describes a way to run large models on devices that exceed the available DRAM capacity. It builds a flash-based inference cost model and uses two key technologies: windowing and row-column bundling, to minimize data transfer and maximize flash throughput.
Among them, windowing allows Apple's large model not to load new data every time, but to reuse part of the processed data. Row-and-column technology allows large models to read data faster from flash memory by grouping data more efficiently. In fact, this new technology is, in a sense, more like an extension of the MLX framework they just released. MLX is a new machine learning framework that aims to run various machine learning models more efficiently on Apple's chips, and a significant difference from other frameworks is the unified memory model.
In other words, in the past year, Apple has not been indifferent to this craze for AI large models, but has been silently building suitable large models according to the characteristics of its own products. Running a large model on the device side on the basis of a smaller memory scale, rather than expanding the memory specifications of future devices, is the answer given by Apple.
So far, almost all mainstream mobile phone manufacturers have also joined the ranks of deploying device-side large models to mobile phones.
Why are these mobile phone manufacturers interested in the end-side large model?Wang Bin, director of the AI laboratory of Xiaomi Group and chief scientist of natural language processing (NLP), previously said in an interview with **, "Wait until the Spring Festival or so, some people feel that they have to at least do it, this storm is coming, we must not stay out of the technology, if we do not enter the game, we will be in a disadvantageous position in the competition."
There is no doubt that everyone is well aware of the current situation of the mobile phone industry, and the recession has lasted for a long time, so major manufacturers are also looking forward to detonating this new concept of the market like a full screen.
In addition, the device-side model also carries the expectations of mobile phone manufacturers for new technologies to ignite consumers' enthusiasm for phone replacement, and they believe that artificial intelligence will enable mobile phones to help users achieve more functions. However, compared with large models running in the cloud, the privacy leakage and data security risks of device-side large models are greatly reduced, and the device-side large models also have the potential for personalization and customization, which can be used to solve problems in specific scenarios.
What's even better is that the device-side model also means that the mobile phone will understand the user's needs to a higher level, which is far from being comparable to the current function of intelligent assistant, which is actually "artificial intellectual disability". In addition, if the device-side large model can control and invoke other applications, just like Google's AI Core, then the pattern between mobile phone manufacturers and third-party applications may usher in earth-shaking changes in the future, and mobile phone manufacturers may have the ability to substantially influence third-party applications, and the benefits behind this are immeasurable.
It's just that compared with other mobile phone manufacturers, it is actually more difficult for Apple to deploy a large device model on the iPhone. At present, there are many end-side large models on mobile phones for user experience, but in the actual use process, aside from the occupation of the Android system itself, the 8GB memory model will hardly do anything once the device-side large model is running. In fact, memory plays a crucial role in the performance of large models, such as the AI chip MI300 series just released by AMD, which focuses on large memory and high bandwidth.
The test results are that unified memory can allow the chip to run a larger model, but the disadvantage is that the inference speed is not ideal due to the low memory bandwidth. Even if Apple doesn't come up with new related technologies, in fact, the iPhone can run a large model on the device side, but the result is that the inference speed may be unbearable for users. Minimizing data transfer and maximizing flash throughput solves this problem.
Currently, both the iPhone 15 and iPhone 15 Plus come with 6GB of RAM, while the iPhone 15 Pro and iPhone 15 Pro Max go up to 8GB of RAM. At this stage, it is speculated that in order to run the device-side large model on the iPhone, Apple is likely to increase the memory configuration of the new iPhone 16 series. But the memory** of Apple devices is well known, and the result of continuing to add memory to the iPhone may make it more expensive**.
You know, the result of the last large-scale price increase of the iPhone is still to give up the market share, so in the case of the current Android flagship product force is constantly catching up, Apple is unlikely to implement the price increase. At the same time, the underlying technology of the large model, the transformers architecture, is actually based on hierarchical inference, and hierarchical loading scheduling is the main way to optimize memory at present, so the combination of the two is the solution given by Apple.
In this way, Apple's potential to explode without adding memory is really amazing.
Describe winter in one picture