A new study** shows that Apple has come up with practical solutions to technical AI problems that other companies seem to ignore, particularly how to use huge large language models on low-memory devices like iPhones.
Despite popular claims that Apple is lagging the industry when it comes to generative AI, the company has twice revealed that it is continuing its long-term planning rather than rushing to release a clone of ChatGPT. The first sign is a proposed study of an artificial intelligence system called hugs, which could generate digital avatars of the kind.
As noted by Venturebeat, the second study** proposes a solution for deploying huge large language models (LLMs) on memory-constrained devices such as iPhones.
The new article, titled "LLM in a Flash: Efficient Large Language Model Inference with Limited Memory," says Apple says it "solves the challenge of efficiently running LLMS that exceed the available DRAM capacity when the available DRAM capacity is insufficient, by storing model parameters in flash memory and transferring them to the DRAM on demand." ”
As a result, the entire LLM still needs to be stored on the device, but it can be achieved by using flash memory as a kind of virtual memory to handle RAM, similar to how memory-intensive tasks are handled in macOS.
"In this flash-memory-based framework, we have introduced two main technologies," the study said. First, reuse previously activated neurons for "windowing" by strategically reducing data transmission. Second, the size of the data blocks read from the flash memory is increased through "row-column binding" that is compatible with the sequential data access advantages of flash memory. ”
Ultimately, this means that LLMS of almost any size can be deployed on devices with limited memory or storage. This means that Apple can leverage AI capabilities on more devices, making it work in more ways.
"The practical results of our study are noteworthy," the study said. We have demonstrated the ability to run LLMS up to twice the size of the available DRAM, inferring 4-5x faster than traditional CPU loading methods, and 20-25x faster on GPUs. ”
This breakthrough is especially important for deploying advanced LLMS in resource-constrained environments, expanding their applicability and accessibility. It continued.
Apple has made the study public, just like Hugs**. So it's not lagging behind, it's actually improving AI capabilities for the industry as a whole.
This is in line with analysts, who believe the company will benefit the most when AI becomes more widely available, given the user base it has.