Rich Miller provides in-depth coverage of the exciting project and the vital role that GRC will fill in keeping the supercomputer cooled. From the article:
Immersion GPU System Provides AI Horsepower for Frontera
What might the rise of artificial intelligence revolution look like in the data center? If one new ssytem is any indication, it could look like GPUs immersed in dielectric liquid coolant fluid, supporting water-cooled x86 servers.
That’s the vision put forward by the creators of Frontera, a new $60 million supercomputer to be built at the Texas Advanced Computing Center (TACC) in Austin. It is expected to be the most powerful supercomputer at any U.S. university, and continue the TACC’s history of deploying new systems ranking among the top 10 on the Top500 list of the world’s leading supercomputers.
The vast majority of data centers continue to cool IT equipment using air, while liquid coolinghas been used primarily in high-performance computing (HPC). With the growing use of artificial intelligence, more companies are facing data-crunching challenges that resemble those seen by the HPC sector, which could make liquid cooling relevant for a larger pool of data center operators.
The design for Frontera reflects the leading edge of HPC efficiency. Frontera is Spanish for “frontier,” and the new supercomputer will help advance the frontiers of liquid cooling, with a hybrid system that will combine Dell EMC servers with x86 Intel processors and water-cooling systems from CoolIT, and a smaller system using NVIDIA GPUs (graphic processing units) immersed in a tank of liquid coolant from GRC (previously Green Revolution Cooling). Data Direct Networks will contribute the primary storage system, and Mellanox will provide the high-performance interconnect for Frontera.
Applying Immersion Benefits to GPUs
Anticipated early projects for Frontera include analyses of particle collisions from the Large Hadron Collider, global climate modeling, and improved hurricane forecasting and “multi-messenger” astronomy research using gravitational waves and electromagnetic radiation.
“Many of the frontiers of research today can be advanced only by computing, and Frontera will be an important tool to solve grand challenges that will improve our nation’s health, well-being, competitiveness and security.” said Dan Stanzione, TACC executive director.
TACC has been a leader in the use of immersion cooling, which sinks servers in liquid to cool the components, and began working with Austin-based neighbor GRC in 2009. In 2017 this collaboration was expanded to immersion cooling for NVIDIA GPUs, test-driving a system created by server vendor Supermicro. Using immersion cooling with GPUs is a fairly recent phenomenon, but may attract interest as more companies adopt GPUs for AI and other parallel processing challenges.
“The cost savings that immersion cooling enables (on the hardware side) are extremely impressive,” TACC’s Stanzione said of the 2017 project. “Being early adopters of GRC’s immersion cooling system we have seen the technology mature rapidly over the years. And with the growing power and computing needs of AI and machine learning applications, especially with hotter and hotter GPUs, cooling is even more important for reliability.”
AI Data Crunching Boosts Density
New hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density – the amount of electricity used by servers and storage in a rack or cabinet – and the accompanying heat. The trend is challenging traditional practices in data center cooling, and prompting data center operators to adapt new strategies and designs.
The alternative is to bring liquids into the server chassis to cool chips and components. Some vendors integrate water cooling into the rear-door of a rack or cabinet. This can also be done by immersing servers in tanks of coolant, or through enclosed systems featuring pipes and plates that bring cooling inside the chassis and directly to the processor.