The remaining part of 2024 is expected to be a busy year for the data center chip market, as rival chipmakers are preparing to launch new processors.
Analysts say that AMD and Intel are expected to release new competitive data center CPUs, while other chipmakers, including hyperscale vendors and startups, plan to introduce new AI chips to meet the soaring demand for AI workloads. For instance, Intel confirmed on Tuesday this week that its new Gaudi 3 artificial intelligence accelerator, designed for AI training and inference, is expected to be fully available in the third quarter of 2024, while Meta announced on Wednesday that its next-generation AI inference processor is now in production and already in use in its data centers.
Omdia's Chief Analyst for Data Center IT, Manoj Sukumaran, stated that while server shipments are expected to grow by 6%, increasing from 10.8 million servers shipped in 2023 to 11.5 million in 2024, server revenue in 2024 is projected to grow by 59% year-over-year, indicating that processors remain a hot and growing market. In fact, server revenue is expected to more than double over the next five years, reaching $270 billion by 2028.
"Although unit shipments are not growing significantly, revenue growth is quite rapid because these servers are packed with a large number of chips, hence the server prices are increasing substantially," Sukumaran told Data Center Knowledge. "This presents a huge opportunity for chip suppliers."
Advertisement
Co-processors are hot commodities
Data center operators have a significant interest in "co-processors" — microprocessors designed to complement and enhance the capabilities of the main processor.Sukumaran stated that traditionally, the data center server market has been CPU-centric, with the CPU being the most expensive component in general-purpose servers. He indicated that in 2020, only slightly over 11% of servers had co-processors, but by 2028, it is projected that over 60% of servers will be equipped with co-processors, which can enhance not only computing power but also efficiency.
Co-processors such as Nvidia H100 and AMD MI300 GPUs, Google Cloud Tensor Processing Units (TPUs), and other custom Application-Specific Integrated Circuits (ASICs) are gaining popularity, as they enable AI training, AI inference, database acceleration, network offloading, and security functions, as well as video transcoding, according to Sukumaran.
The analyst pointed out that video transcoding is a process that allows Netflix, YouTube, and other streaming services to optimize video quality for different user devices ranging from televisions to smartphones.
The CPU market for AMD and Intel with ARM CPUs remains highly profitable. Intel continues to lead in market share, but AMD and Arm-based CPUs from startups like Ampere and other cloud service providers have chipped away at Intel's dominance in recent years.
Data from Omdia shows that while Intel holds 61% of the CPU market share, AMD has gained significant traction, increasing from less than 10% of server shipments in 2020 to 27% in 2023. Arm CPUs captured 9% of the market share last year.
Sukumaran said, "Over the past few years, the Arm software ecosystem has matured considerably, and the low power consumption and high core density of Arm CPUs are very attractive to cloud service providers."
Indeed, Google Cloud announced on Tuesday that its first Arm-based CPU (called the Google Axion processor) will be made available to customers later this year.
Intel aims to regain its footing in the CPU market this year by launching its next-generation server processors. The new Intel Xeon 6 processor with E-cores (previously codenamed "Sierra Forest") is expected to hit the market in the second quarter of 2024, designed specifically for hyperscale enterprises and cloud service providers that require efficiency and performance.
Following that will be the introduction of the new Intel Xeon 6 processor with P-cores, previously codenamed Granite Rapids, which focuses on high performance. However, AMD is not sitting idle, planning to release its fifth-generation EPYC CPU named Turin.
Matt Kimball, Vice President and Principal Analyst at Moor Insights & Strategy, said, "AMD is undoubtedly the performance leader and has done an excellent job of taking market share from Intel." Almost all of this has been stored in the cloud by hyperscale enterprises, and AMD also hopes to further expand its gains in on-premises enterprises. "By 2024, from a performance standpoint, you will see Intel being competitive again in the server-side CPU space."Chip manufacturers are turning their attention to artificial intelligence inference
Companies across various verticals are racing to build artificial intelligence models, so the scale of AI training remains substantial. However, Jim McGregor, founder and chief analyst at Tirias Research, says that by 2024, the market for AI inference chips will begin to emerge.
"The shift is towards inference processing," he says. "We're seeing a surge in AI workloads and generative AI workloads. They've trained the models. Now, they need to run them over and over again and want to do so as efficiently as possible. So, expect to see vendors rolling out new products."
McGregor says that Nvidia dominates the AI space with its GPUs, but AMD has introduced a viable competitor with the Instinct MI300 series GPUs for AI training and inference, released in December.
While GPUs and even CPUs are used for both training and inference, an increasing number of companies (including hyperscalers like Qualcomm, Amazon Web Services (AWS), and Meta, as well as AI chip startups like Groq, Tenstorrent, and Untether AI) have built or are developing chips specifically for AI inference. Analysts also say that these chips are more energy-efficient.
Kimball says that when organizations deploy Nvidia H100 or AMD MI300, these GPUs are well-suited for training because they are large, have a multitude of cores, and feature high-bandwidth memory, offering high performance.
"Inference is a lighter task. They don't need the heavy-duty capabilities of the H100 or MI300," he says.
Top data center chips for 2024Here is the translation of the provided text into English:
Below is a list of processors expected to be released in 2024.
AMD
AMD CEO Dr. Lisa Su stated during the fourth-quarter 2023 earnings call that AMD plans to launch the next-generation server processor, Turin, in the second half of 2024. Turin is based on the company's new Zen 5 core architecture.
"Turin is a direct replacement for the existing fourth-generation EPYC platform, extending our leadership in performance, efficiency, and TCO by adding the next-generation Zen 5 cores, new memory expansion features, and a higher core count," she said during the earnings call.
There are no specific details about the product yet. However, analyst at Moor Insights & Strategy, Patrick Moorhead, stated that the product will be of significant importance. He said, "AMD will seek to further differentiate itself from Intel in terms of performance and performance per watt." Since its launch in December last year, AMD has also seen a huge demand for its Instinct MI300 accelerators, including the MI300X GPU. Dr. Su indicated in the earnings call that the company plans to actively increase production of the MI300 for cloud, enterprise, and supercomputing customers this year.
Intel
Intel executives plan to release several major chips this year: the Gaudi 3 AI accelerator and the next-generation Xeon server processor.
The Gaudi 3 is designed for AI training and inference, targeting the enterprise market. It is intended to compete with GPUs from Nvidia and AMD. Intel states that this AI chip will offer four times the artificial intelligence computing power and 1.5 times the memory bandwidth compared to its predecessor, Gaudi 2.
Intel executives added that compared to Nvidia's H100 GPU, the Gaudi 3 is expected to accelerate training and inference times by 50%, with a 40% improvement in inference energy efficiency.
Analyst Karl Grobe stated, "This will have a significant energy-saving effect and a lower price."As for the next-generation Intel Xeon 6 processors, Sierra Forest will include a version with 288 cores, which will be the largest core count in the industry. This is also the company's first "E-core" server processor, designed to balance performance with energy efficiency.
Granite Rapids is a "P-core" server processor, designed to achieve optimal performance. The company states that it will provide two to three times the performance for AI workloads compared to Sapphire Rapids.
An Intel spokesperson indicated that Gaudi 3 will be supplied to OEMs in the second quarter of 2024, with full availability expected in the third quarter. Sierra Forest (now referred to as the Intel Xeon 6 processor with E-cores) is expected to hit the market in the second quarter of 2024. The Intel spokesperson stated that Granite Rapids (now called the Intel Xeon 6 processor with P-cores) is expected to be launched "soon."
This news comes after Intel launched its fifth-generation Xeon CPUs last year.
NVIDIA
In mid-March, NVIDIA announced that it will begin shipping the next-generation Blackwell GPU later this year, with analysts stating that this will allow the chip giant to continue to dominate the AI chip market.
The new series of Blackwell GPUs is designed specifically for cloud providers and enterprises, offering 20 petaflops of AI performance on a single GPU, enabling organizations to train AI models at four times the speed, increase AI inference performance by 30 times, and use up to 25 high-efficiency chips, which executives claim are several times more energy-efficient than Nvidia's previous Hopper architecture chips.
Nvidia will also ship the H200, based on Hopper, in the second quarter of 2024. The company recently announced new benchmark tests, indicating that it is the most powerful platform for running generative artificial intelligence workloads. The company states that the H200 performs 45% faster than the H100 when inferring the 70 billion-parameter Llama 2 model.
Ampere
In May last year, the startup led by former Intel President Renee James announced a new series of custom-designed, Arm-compatible server processors. With up to 192 cores, company executives stated that the processor, named AmpereOne, is designed specifically for cloud service providers, offering both high performance and high energy efficiency.AWS
AWS is one of the hyperscale providers that collaborate with major chip manufacturers such as Nvidia, AMD, and Intel, utilizing their processors to deliver cloud services to customers. However, they have also found it advantageous and cost-effective to construct their own custom chips to power their data centers and provide cloud services to clients.
This year, AWS will introduce the Graviton4, an Arm-based CPU designed for general-purpose workloads, as well as the Tranium2 for artificial intelligence training. Gadi Hutt, Senior Director of Product and Business Development at AWS Annapurna Labs, stated that last year, the company also launched the second-generation artificial intelligence inference chip, Inferentia2.
"Our goal is to give customers the freedom to choose and provide them with high performance at significantly reduced costs," said Hutt.
The Tranium2 boasts four times the computing power of its inaugural Tranium processor, with three times the memory. Hutt mentioned that AWS used the first Tranium chip in 60,000 chip clusters, and Tranium2 will be utilized in 100,000 chip clusters.
Microsoft Azure
Microsoft recently unveiled the Microsoft Azure Maia 100 AI accelerator for artificial intelligence and generative AI tasks, as well as the Cobalt 100 CPU, an Arm-based processor for general computing workloads.
The company announced in November last year that it would begin rolling out these two processors in early 2024, initially to support Microsoft services such as Microsoft Copilot and Azure OpenAI Service.
The Maia AI accelerator is specifically designed for AI training and inference, while the Cobalt CPU is an energy-efficient chip aimed at delivering excellent performance per watt.
Google Cloud
Google Cloud has been investing in custom silicon to enhance its cloud services, particularly for AI and machine learning workloads. They have developed the Tensor Processing Unit (TPU), which is a specialized chip designed to accelerate the performance of Google's AI algorithms and services. The TPUs are used to provide customers with high-speed processing capabilities for tasks such as image recognition, language translation, and other AI-driven applications. Google continues to iterate and improve its TPUs to stay competitive in the cloud computing market, offering customers cutting-edge technology to meet their AI and machine learning needs.Google Cloud is a pioneer in the hyperscale domain, having first introduced custom Tensor Processing Units (TPUs) in 2013. These TPUs are specifically designed for artificial intelligence training and inference and are available to customers on Google Cloud. These processors also support Google services such as Search, YouTube, Gmail, and Google Maps.
The company launched its fifth-generation TPU at the end of last year. It stated that the Cloud TPU v5p trains models at a speed 2.8 times faster than its predecessor.
Google Cloud announced on Tuesday the development of its first Arm-based CPU, named the Google Axion processor. The new CPU, built using the Arm Neoverse V2 CPU, will be made available to Google Cloud customers later this year.
The company stated that customers will be able to utilize Axion across many Google Cloud services, including Google Compute Engine, Google Kubernetes Engine, Dataproc, Dataflow, and Cloud Batch.
Analyst Kimball anticipates that as Google Cloud begins to deploy its own CPUs for its customers, the revenue of AMD and Intel will be impacted.
Meta
announced that it has deployed next-generation custom chips for artificial intelligence inference in its data centers this year.
The next-generation AI inference chip, previously code-named Artemis, is part of the Meta Training and Inference Accelerator (MTIA) series of custom chips designed for Meta's AI workloads.
Meta launched the first generation of AI inference chips, MTIA v1, last year. The company stated that the new next-generation chip offers three times the performance and 1.5 times the performance per watt compared to the first-generation chip.
Cerebras SystemsArtificial intelligence hardware startup Cerebras Systems launched its third-generation AI processor, the WSE-3, in mid-March. This wafer-scale chip boasts twice the performance of its predecessor and competes with NVIDIA in the high-end AI training market.
In mid-March, the company also partnered with Qualcomm to provide AI inference services for its customers. Models trained on Cerebras hardware are optimized to run inference on the Qualcomm Cloud A100 Ultra accelerator.
Groq
Groq, an AI chip startup located in Mountain View, California, has built the LPU inference engine to run large language models, generative AI applications, and other AI workloads.
Groq released its first AI inference chip in 2020, targeting customers such as hyperscale enterprises, the public sector, AI startups, and developers. A company spokesperson stated that the company is set to release its next-generation chip by 2025.
Tenstorrent
Tenstorrent, an AI inference startup based in Toronto, has a storied history: its CEO is Jim Keller, a chip architect who has worked at Apple, AMD, Tesla, and Intel, contributing to the design of AMD's Zen architecture and the early chips for Apple's iPad and iPhone.
Bob Grim, Vice President of Strategy and Corporate Communications at Tenstorrent, stated that the company has begun taking orders for its Wormhole AI inference chip this year, with an official launch expected later in the year.
He said that Tenstorrent is selling servers powered by 32 Wormhole chips to enterprises, labs, and any organizations requiring high-performance computing. Grim indicated that Tenstorrent is currently focused on AI inference, but its chips can also support AI training, so the company plans to support AI training in the future as well.
Untether AI
Untether AI is an AI chip startup that specializes in creating high-performance AI processors for edge computing applications. The company aims to deliver chips that can handle complex AI tasks with low latency and high efficiency, which is crucial for real-time processing in edge devices. Untether AI's technology is designed to be flexible and scalable, allowing it to adapt to various AI workloads and use cases. While specific details about their products and timeline for release are not provided here, the company is likely working on innovative solutions to meet the growing demands of the AI industry.Untether AI is an artificial intelligence chip startup based in Toronto, dedicated to creating energy-efficient AI inference chips.
The company's spokesperson stated that the president of the company is former Intel Vice President and General Manager Chris Walker. The company launched its first product in 2021 and plans to release its second-generation SpeedAI240 chip this year.
The spokesperson indicated that Untether AI's chips are designed for a variety of form factors, from single-chip devices for embedded applications to 4-chip PCI-Express accelerator cards, making their processors suitable for use in everything from edge to data center environments.
Comment