Baijiao from the Qubit of the Concave Fei Temple | qbitai
If you want to achieve the fastest large-scale model inference in history, you need 11.71 million US dollars (84.1 million yuan)??
Under the same project, the cost of using NVIDIA GPUs is only $300,000......
Regarding the change of ownership of the strongest AI chip, GroQ may have to let the bullets fly for a while.
In the past two days, Groq has made a stunning appearance. It uses a chip known as "100 times more cost-effective NVIDIA" to achieve 500 tokens per second large model generation, without feeling any delay. Coupled with the buff of such a high-tech talent as Google's TPU team, many people shouted: Nvidia is going to be crushed ......
After the uproar, there began to be some rational discussions, mainly on the cost of benefits for GroQ.
According to a rough calculation by netizens, 568 chips are needed to demonstrate the demo now, costing $11.71 million.
As a result, people from all walks of life in and outside the industry invariably launched an arithmetic**.
There was even an analyst who showed up with a ** ......
And sighed: OK, everybody is doing public math this week
However, Groq also responded on social networks as soon as possible.
Everybody is doing math".
Participating in the discussion of the cost of Groq include computer students, cloud vendors that provide inference services, and even former Groq employees vs. current employees......It's not lively.
Let's take a look at what everyone thinks.
First of all, a rough estimate is that the ** of a card is about 20,000 US dollars, and the memory is only 023gb。
For a single LLAMA 70B model, you would need to purchase about 320 cards (actually more), which will cost about $10 million ...... including servers
And if you compare it with Nvidia's H100, what is the situation?
Lepton's Jia Yangqing also got involved and made a calculation. In addition to the basics, he also analyzed from the perspectives of energy consumption, performance, operating costs, etc.
Finally, these core points are summarized:
For the LLAMA 70B model, using 572 cards to calculate, the annual electricity bill will cost 25$40,000; Half the performance of the Groq can be achieved with 4 H100 cards, and today an 8-card H100 box costs about $300,000. If in operation for three years, the cost of hardware procurement for Groq would be $11.44 million and the operating cost would be $76$20,000. In comparison, H100 has lower procurement and operating costs. It is worth mentioning that in the benchmarks given by Groq, including Lepton, the inference speed is about three times that of Lepton.
Jia Yangqing also revealed that he and the founder of Groq are old acquaintances:
We know each other when we are at Google.
However, there are other algorithms in these discussions.
For example, some netizens reacted, according to the dimension of a single token**, what is the situation?
It doesn't matter, there are more professional analysts who will take action.
However, according to his calculations, for every 1 million tokens, GroQ is more cost-effective.
In addition to this, there are some other discussions, like is there support for speeding up any transformer?
GroQ&A.
Due to the excessive attention of everyone, Groq couldn't help but answer it himself.
It's time for another FAQ post to clarify.
The main points are as follows:
Take the open source model, adapt to our compiler, then run it, and that's it. Our token** is very affordable and efficient, because from the chip to the system, we do it ourselves, and there is no middleman; will not be chips, except for third-party vendors; Published sales figures are skewed. Our target customers are not single-card users. In addition, it is still ...... in the continuous Q&A
So whether Groq can really shake Nvidia's position, it is estimated that we will have to wait a little longer.
However, yesterday, Nvidia's stock price changed a wave of ......
Reference link: [1].