AI is at the forefront of people’s minds, and innovations are happening at lightning speed. But to continue the pace of AI innovation, companies need the right infrastructure for the compute-intensive AI workloads they are trying to run. This is what we call ‘purpose-built infrastructure’ for AI, and it’s a commitment Microsoft has made to its customers. This commitment doesn’t just mean taking hardware that was developed by partners and placing it in its’ datacenters; Microsoft is dedicated to working with partners, and occasionally on its own, to develop the newest and greatest technology to power scientific breakthroughs and AI solutions.
One of these technologies that was highlighted at Microsoft Ignite in November was hollow core fiber (HCF), an innovative optical fiber that is set to optimize Microsoft Azure’s global cloud infrastructure, offering superior network quality, improved latency and secure data transmission.
Transmission by air
HCF technology was developed to meet the heavy demands of workloads like AI and improve global latency and connectivity. It uses a proprietary design where light propagates in an air core, which has significant advantages over traditional fiber built with a solid core of glass. An interesting piece here is that the HCF structure has nested tubes which help reduce any unwanted light leakage and keep the light going in a straight path through the core.
As light travels faster through air than glass, HCF is 47% faster than standard silica glass, delivering increased overall speed and lower latency. It also has a higher bandwidth per fiber, but what is the difference between speed, latency and bandwidth? While speed is how quickly data travels over the fiber medium, network latency is the amount of time it takes for data to travel between two end points across the network. The lower the latency, the faster the response time. Additionally, bandwidth is the amount of data that is sent and received in the network. Imagine there are two vehicles travelling from point A to point B setting off at the same time. The first vehicle is a car (representing single mode fiber (SMF)) and the second is a van (HCF). Both vehicles are carrying passengers (which is the data); the car can take four passengers, whereas the van can take 16. The vehicles can reach different speeds, with the van travelling faster than the car. This means it will take the van less time to travel to point B, therefore arriving at its destination first (demonstrating lower latency).
For over half a century, the industry has been dedicated to making steady, yet small, advancements in silica fiber technology. Despite the progress, the gains have been modest due to the limitations of silica loss. A significant milestone with HCF technology was reached in early 2024, attaining the lowest optical fiber loss (attenuation) ever recorded at a 1550nm wavelength, even lower than pure silica core single mode fiber (SMF). 1 Along with low attenuation, HCF offers higher launch power handling, broader spectral bandwidth, and improved signal integrity and data security compared to SMF.
The need for speed
Imagine you’re playing an online video game. The game requires quick reactions and split-second decisions. If you have a high-speed connection with low latency, your actions in the game will be transmitted quickly to the game server and to your friends, allowing you to react in real time and enjoy a smooth gaming experience. On the other hand, if you have a slow connection with high latency, there will be a delay between your actions and what happens in the game, making it difficult to keep up with the fast-paced gameplay. Whether you’re missing key action times or lagging behind others, lagging is highly annoying and can seriously disrupt gameplay. Similarly, in AI models, having lower latency and high-speed connections can help the models process data and make decisions faster, improving their performance.
Reducing latency for AI workloads
So how can HCF help the performance of AI infrastructure? AI workloads are tasks that involve processing large amounts of data using machine learning algorithms and neural networks. These tasks can range from image recognition, natural language processing, computer vision, speech synthesis, and more. AI workloads require fast networking and low latency because they often involve multiple steps of data processing, such as data ingestion, preprocessing, training, inference, and evaluation. Each step can involve sending and receiving data from different sources, such as cloud servers, edge devices, or other nodes in a distributed system. The speed and quality of the network connection affect how quickly and accurately the data can be transferred and processed. If the network is slow or unreliable, it can cause delays, errors, or failures in the AI workflow. This can result in poor performance, wasted resources, or inaccurate outcomes. These models often need huge amounts of processing power and ultra-fast networking and storage to handle increasingly sophisticated workloads with billions of parameters, so ultimately low latency and high-speed networking can help speed up model training and inference, improve performance and accuracy, and foster AI innovation.
Helping AI workloads everywhere
Fast networking and low latency are especially important for AI workloads that require real-time or near-real-time responses, such as autonomous vehicles, video streaming, online gaming, or smart devices. These workloads need to process data and make decisions in milliseconds or seconds, which means they cannot afford any lag or interruption in the network. Low latency and high-speed connections help ensure that the data is delivered and processed in time, allowing the AI models to provide timely and accurate results. Autonomous vehicles exemplify AI’s real-world application, relying on AI models to swiftly identify objects, predict movements, and plan routes amid unpredictable surroundings. Rapid data processing and transmission, facilitated by low latency and high-speed connections, enable near real-time decision-making, enhancing safety and performance. HCF technology can accelerate AI performance, providing faster, more reliable, and more secure networking for AI models and applications.
Regional implications
Beyond the direct hardware that runs your AI models, there are more implications. Datacenter regions are expensive, and both the distance between regions, and between regions and the customer, make a world of difference to both the customer and Azure as it decides where to build these datacenters. When a region is located too far from a customer, it results in higher latency because the model is waiting for the data to go to and from a center that is further away.
If we think about the car versus van example and how that relates to a network, with the combination of higher bandwidth and faster transmission speed, more data can be transmitted between two points in a network, in two thirds of the time. Alternatively, HCF offers longer reach by extending the transmission distance in an existing network by up to 1.5x with no impact on network performance. Ultimately, you can go a further distance at the same latency envelope as traditional SMF and with more data. This has huge implications for Azure customers, minimizing the need for datacenter proximity without increasing latency and reducing performance.
The infrastructure for the era of AI
HCF technology was developed to improve Azure’s global connectivity and meet the demands of AI and future workloads. It offers several benefits to end users, including higher bandwidth, improved signal integrity, and increased security. In the context of AI infrastructure, HCF technology can enable fast, reliable, and secure networking, helping to improve the performance of AI workloads.
As AI continues to evolve, infrastructure technology remains a critical piece of the puzzle, ensuring efficient and secure connectivity for the digital era. As AI advancements continue to place additional strain on existing infrastructure, AI users are increasingly seeking to benefit from new technologies like HCF, virtual machines like the recently announced ND H100 v5, and silicon like Azure’s own first partner AI accelerator, Azure Maia 100. These advancements collectively enable more efficient processing, faster data transfer, and ultimately, more powerful and responsive AI applications.
Keep up on our “Infrastructure for the Era of AI” series to get a better understanding of these new technologies, why we are investing where we are, what these advancements mean for you, and how they enable AI workloads.
Source: microsoft.com
0 comments:
Post a Comment