electronics-journal.com
17
'26
Written on Modified on
AMD Advances Rack-Scale AI Infrastructure with Celestica
Open standards-based platform integrates high-speed networking and GPU interconnects to support scalable AI clusters across cloud, enterprise, and research environments.
www.amd.com

As demand for AI infrastructure grows across cloud, enterprise, and research environments, the ability to scale compute and networking efficiently at rack level has become a key design requirement. AMD and Celestica address this need through a strategic collaboration to deliver a rack-scale AI platform based on open standards.
The “Helios” platform combines AMD’s high-performance computing capabilities with Celestica’s expertise in advanced networking switch design and manufacturing. Designed for large-scale AI clusters, the system integrates compute, interconnect, and networking into a unified rack-level architecture, enabling more efficient deployment of high-performance AI clusters.
How rack-scale design improves AI deployment efficiency
Traditional AI infrastructure often requires integrating multiple subsystems—compute nodes, GPUs, and networking—at the data center level. Rack-scale architectures such as “Helios” shift this integration upstream, delivering pre-aligned systems that reduce deployment complexity and improve consistency.
Celestica is responsible for the research, design, and manufacturing of scale-up networking switches within the platform. These switches are engineered to support high-speed interconnects between next-generation AMD Instinct MI450 Series GPUs, enabling low-latency communication critical for distributed AI workloads such as model training and inference at scale.
The platform is based on the Open Compute Project (OCP) and uses the Open-Rack-Wide (ORW) form factor, allowing interoperability and standardization across data center environments.
High-speed interconnects for GPU scaling
A key technical component of the “Helios” platform is its use of Ultra Accelerator Link over Ethernet (UALoE) architecture for scale-up connectivity. This approach enables high-bandwidth, low-latency communication between GPUs across the rack, which is essential for synchronizing large AI models.
By leveraging Ethernet-based interconnects aligned with open standards, the platform aims to balance performance with flexibility. This design allows data center operators to integrate AI systems into existing infrastructure more easily compared to proprietary interconnect solutions.
The networking switches incorporate advanced silicon designed to handle the bandwidth and data movement requirements of next-generation AI workloads, supporting efficient scaling as cluster sizes increase.
Applications across cloud, enterprise, and research
The rack-scale AI platform is intended for environments where large-scale model training, simulation, and data processing are critical. This includes hyperscale cloud providers deploying AI services, enterprises building internal AI capabilities, and research institutions running compute-intensive simulations.
For these users, the main advantage lies in reduced time-to-deployment. By delivering a pre-integrated rack-level solution, the platform minimizes system integration effort and helps ensure consistent performance across deployments.
Additionally, the use of open standards supports supply chain flexibility and reduces dependency on proprietary ecosystems, which can be a limiting factor in large-scale infrastructure rollouts.
Positioning within the AI infrastructure ecosystem
Rack-scale AI systems are increasingly being developed by major industry players. Comparable approaches include NVIDIA’s rack-scale AI systems (such as DGX-based infrastructure) and other OCP-aligned designs that emphasize integrated compute and networking.
The AMD–Celestica platform differentiates itself through its adherence to open standards such as OCP and ORW, combined with Ethernet-based GPU interconnects. This contrasts with more tightly integrated proprietary ecosystems, offering data center operators greater flexibility in system design and deployment.
Availability and deployment outlook
The “Helios” rack-scale AI platform is expected to be available in late 2026, with deployments planned across multiple sectors requiring scalable AI compute infrastructure.
By integrating compute, networking, and interconnect technologies into a standardized rack-level system, the collaboration addresses a core challenge in AI deployment: delivering scalable performance while maintaining flexibility and reducing integration complexity.
Edited by Industrial Journalist, Natania Lyngdoh — AI-Powered.
www.amd.com

