At the beginning of 2024, the technology stocks that rose sharply last year fell sharply, but Nvidia, the leader of the AI wave, still has unabated momentum.
No chip company does not look down on NVIDIA's status, and as the cake of the AI industry gets bigger and bigger, the hardware track is also visible to the naked eye. A large number of startups are trying to get a piece of the Nvidia GPU budget.
*Summarizes 12 companies that are currently at the forefront of the competition. The average history of these start-ups is only five years, and the highest amount of funding has been 7$200 million, they are all strong challengers to Nvidia.
Founded: 2015
Application: Training
Cerebras is known for manufacturing giant chips. Co-founded by Gary Lauterbach and Andrew Feldman. The two also co-founded Seammicro, a company focused on ultra-high-density computer servers, which was sold by AMD in 2012 with a whopping 3$5.7 billion ** acquisition.
Cerebras' main products are supercomputer chips and systems that can be used for AI training, built for supercomputing tasks, and such chips are about 56 times the size of ordinary GPUs.
Cerebras' customers are concentrated in defense, academic laboratories and other institutions. The flagship CS-2 supercomputing system has been deployed in Argonne National Laboratory of the U.S. Department of Energy, Pittsburgh Supercomputing Center, University of Edinburgh Supercomputing Center and other places.
However, while it has already secured a whopping $700 million in financing, Cerebras faces a daunting challenge in securing commercial customers due to the dominance of Nvidia's GPUs and CUDA ecosystem.
In January, the company announced that it would partner with Mayo Clinic, a leading U.S. medical institution, to develop a proprietary AI model based on decades of anonymized medical records and data, using Cerebras' computing chips and software.
It has been reported that some models will be able to read and write text, such as summarizing the most important parts of a medical record for a new patient. Other models can analyze complex medicine** or analyze genomic data.
Cerebras chief executive officer Andrew Feldman said it was a multi-year, "multi-million-dollar" deal.
Founded: 2019
Application: Inference
Founded in 2019, d-matrix is developing a dedicated chip and software for running machine learning models, and the company's chips can combine processing and memory, which are often separate and distinct components on the chip.
D-Matrix's chips generate less heat and therefore require less cooling, making them more cost-effective than mainstream GPU and CPU chips. According to the company's CEO, many companies want to use large models to design AI applications, and cost is very important.
D-Matrix chooses to focus on inference, i.e., running AI models, rather than training. The company believes that over time, the model will get bigger and the cost of running will get higher. The company already has customers testing its chips and software, and plans to put it into commercial use in the first half of '24.
Founded: 2023
Application: Inference
Founded in June last year by two Harvard dropouts, G**in Uberti and Chris Zhu, EHED plans to produce an AI inference acceleration chip called SOHU that will deliver 10 times the inference performance of the H100. The company was valued at $34 million shortly after its founding.
According to reports,In terms of manufacturing process, SOHU adopts a revolutionary method of carving the transformer architecture directly into the core of the chip. As a result, performance can reach unprecedented heights, with SOHU running large models in simulations up to 140 times faster than traditional GPUs. SOHU also supports better encoding through tree search, the ability to compare hundreds of responses in parallel, and multicast speculative decoding to generate new content in real time.
According to Etched's blog, this architecture will allow trillion-parameter models to be run with unmatched efficiency. With only one core, the system accommodates a fully open-source software stack that scales up to 100t parametric models.
Founded: 2022
Applications: Inference & Training
Extropic is the most mysterious of these startups. The founder of the company came from "X", the "moon landing factory" division of Google, which focuses on cutting-edge technology exploration. According to reports, Extropic focuses on quantum computing and plans to develop a chip specifically designed to run large models, but there are still no details about specific products exposed.
At the end of last year, the company just closed a $14.1 million seed round of financing.
According to the company's press release, with the rise of generative AI and the world's need for scalable, cost-effective, and efficient computing has increased dramatically, Extropic hopes to enable computers to harness entropy as an asset, learn by programming, and operate with unprecedented efficiency, according to the company's press release
Extropic's computational paradigm is built on the principles of thermodynamics and aims to seamlessly integrate generative AI with the fundamental physics of the world. Our goal is to embed generative AI into physical processes eventuallyPush the limits of efficiency dictated by the laws of physics in terms of space, time, and energyFounded: 2016Application: Inference
Graphcore was founded in 2016 and is headquartered in Bristol, UK. The company's main product is the intelligent processing unit (LPU), and focuses on large model inference.
The biggest feature of the company's products is the extremely fast generation speed, which can ensure a smooth terminal experience. In consumer AIGC applications, users have high speed requirements, and the Groq LPU with the open-source model Meta Llama 2 70B can generate 300 words per second, which can generate the same number of words as Shakespeare's "Hamlet" in 7 minutes, which is 75 times faster than the average person's typing speed.
Jonathan Ross, co-founder and CEO of GroQ, believes that the cost of inference is becoming an issue for companies that use AI in their products, as the number of customers using these products increases, so does the cost of running models. Compared to NVIDIA GPUs, GroQ LPU clusters will provide higher throughput, lower latency, and lower cost for large model inference.
In addition, due to the capacity of HBM3 and CONOS packaging, the current production capacity of NVIDIA GPUs cannot fully meet customer needs, and the unique feature of the Groq LPU is that it does not rely on Samsung or Hynix's HBM, nor does it rely on TSMC's CODOS packaging technology, so it will not face capacity bottlenecks like NVIDIA.
Founded: 2017
Applications: Training & Inference
LightMatter, which uses light emitted by lasers to transmit data between chips and server farms, was founded by MIT students using the school's patented technology.
According to the company's co-founder and CEO Nicholas Harris, compared to chip manufacturers such as Nvidia, AMD, and Intel that transmit data over cablesLightMatter's products can reduce data center energy costs by about 80%.
Founded: 2022
Applications: Unannounced
MATX was founded by former Google employees, CEO Reiner Pope is one of the developers of Google's Pathways model, and CTO Mike Gunter is one of the developers of Google's TPU.
MATX is developing LLM-specific chips for text applications. The company said its self-developed chips run faster and cost less than Nvidia's GPU hardware, and can support a variety of AI applications, including image generation.
Mattx said it had received support from several venture capital firms, but did not disclose specific funding, and said it had received "strong support from well-known large-scale model developers," but did not disclose the specific company.
Founded: 2022
Application: InferenceThis year, I started to dabble in training
Modular focuses on building a development platform and coding language for training and running large models, where users can use a variety of AI tools, including Google's open-source software TensorFlow and Meta's open-source software PyTorch.
The company believes thatAI development is now hampered by overly complex and fragmented technical infrastructures, and Modulal's mission is to remove the complexity of building and maintaining AI systems at scale.
Building and running AI applications requires a lot of computing power, and to control costs, a company may use different types of AI chips, but the software of these chips is often incompatible with each other. In particular, Nvidia's CUDA software for writing machine learning applications only runs on its own chips, which essentially locks developers into their GPUs. CUDA is extremely sticky to users, with reports that it took two years for a computer vision startup to switch to non-Nvidia chips.
Modular hopes to change this by developing an alternative to CUDA, which solves the software compatibility issue of different chips and makes it easier to use non-NVIDIA chips.
Founded: 2017
Applications: Inference & Fine-tuning
The training and inference process of traditional GPUs incurs high costs, in part due to the heat generated by these chips as they transfer data from memory and processing components, so the GPUs need to be continuously cooled, increasing the cost of power in the data center.
Rain AI's NPU chips can mimic the human biological brain, combining memory and processing capabilities to not only excel in computing speed and energy efficiency, but also to customize or fine-tune AI models in real time according to the surrounding environment. However, the company has not yet produced a finished product.
According to **, a letter of intent signed in 2019 shows that OpenAI plans to spend $51 million to buy Rain AI NPU chips, which will be used for the training and deployment of GPT models.
Founded: 2018
Application: Inference
sima.AI focuses on developing hardware and software for edge computing devices for use in scenarios such as aircraft, drones, automobiles, and medical devices, rather than data centers.
The company's founder, Krishna Rangasayee, worked for chipmaker Xilinx for nearly two decades. Previously, in an interview with **, he said that many industries are unable to use cloud-based AI services for various reasons, SIMAAI will focus on serving those decentralized edge computing devices.
For example, self-driving cars need to make decisions on the fly, and only built-in AI can meet their demanding latency requirements. And in industries such as healthcare, companies may not want to send sensitive data to the cloud, but rather keep it on their devices.
June 2023, SIMAAI says it has started mass production of its first-generation edge AI chips. The company said it is working with more than 50 customers in manufacturing, automotive and aviation, among others.
Founded: 2016
Applications: Training & Inference
Tenstorrent was founded by three former AMD employees and is headquartered in Toronto, Canada.
TensTorrent develops RISC-V and AI chips in the form of heterogeneous and chiplet designs. At present, two chips, Grayskull and Wormhole, based on 12nm process, have been developed, and the FP8 computing power is up to 328Tflops. The company's goal is to push ** to 1 5 to 1 10 of similar performance GPUs.
In 2021, Tenstorrent also launched DevCloud, which allows AI developers to run large models without having to buy hardware.
However, in recent years, perhaps feeling the pressure of hardware manufacturers such as Nvidia, Tenstorrent has shifted its focus to technology licensing and services.
Founded: 2022
Applications: Training & Inference
Founded by George Hotz, founder and former CEO of self-driving startup Comma AI, Tiny Corp's products will be built with an open-source deep learning tool called Tinygrad, which is said to help developers train and run large language models faster.
Hotz believes that Tinygrad can be a "strong contender" to PyTorch, a deep learning product derived from Meta. But at the moment he has not revealed specific details about the product.