AI DC – Renaissance and New Thinking Required. Article 3 of 5

datacenterprimerja
Apr 15
7 min read

The AI DC Is Not a Data Center with GPUs

This article speaks most directly to design and construction specialists. The implications run through to operations teams.

I want to say something directly to the design and construction professionals reading this. You are not doing anything wrong. Your methodology is sound. Your experience is real. The engineering discipline you have built over years of delivering precision facilities (power distribution, MEP design, structural engineering, commissioning) is genuinely valuable.

But an AI data center demands more. It requires a rethink and a willingness to look deeper into the rack and the servers that occupy it. This is not a minor calibration of existing practice. It is a fundamental mismatch between what the facility must do and what the design process currently produces. That mismatch has consequences for the owner, for the operator, for the AI workloads the facility exists to serve, and ultimately for the professional credibility of the teams that delivered it. This is a professional accountability issue and it needs to be named as one.

How the D&C Community Arrived Here

The design and construction industry adapted well to the data center boom of the 2000s and 2010s. Data centers became a recognised sub-sector of industrial construction. Specialist contractors, specialist engineers, and specialist consultants developed genuine expertise. Standards emerged. Best practices were codified. Delivery timelines compressed. Costs became more predictable.

That expertise was built around a specific type of occupant: commodity x86 servers in standard racks drawing 5 to 15 kilowatts per rack, in air-cooled data halls with predictable load profiles and well-understood MEP requirements. The facility shell was the product and it was designed accordingly – shell inward. The design sequence (establish the shell, define the power and cooling envelope, specify the white space) worked because the occupant was generic. Any standard server workload could be accommodated. Design once, replicate at scale. The AI data center breaks that sequence at every step.

The Rack (and Outward to whole data hall) Is the Computer

This is the single most important thing the D&C community needs to internalise.

In a traditional data center, the rack is furniture. A physical structure that holds servers. Servers are the unit of compute. Racks are interchangeable.

In an AI data center, the rack is the computer. NVIDIA's Vera Rubin NVL72 is a rack-scale system integrating 36 Vera CPUs and 72 Rubin GPUs interconnected with NVLink 6 fabric. The rack operates as one machine. Individual components are not independently meaningful. The rack is the unit of integration and it must be treated as such from the first moment of design.

This changes what the design brief must contain. The physical layout of racks is not a tenant fit-out detail. NVLink 6 switch fabric topology constrains how racks can be physically arranged relative to each other. InfiniBand or Ethernet scale-out cabling has specific routing requirements that must be accommodated in cable management, pathway design, and floor layout before structural decisions are finalised. If the D&C team does not understand the compute topology when they begin design, they will produce a facility that compromises the performance of the system it exists to serve.

What this means in practice is concrete. On a live AI DC project, the D&C team must lock the rack and fabric topology early enough to inform slab design, column spacing, and penetrations. Cable pathways and overhead or underfloor zones must be laid out around high‑bandwidth interconnect routes, not the other way round. Structural loading, crane access, and maintenance clearances must be checked against the real integrated rack system, not a generic 42U placeholder.

Liquid Cooling Is the Physics

This conversation should no longer need to happen. And yet it does.

Liquid cooling in an AI data center is not a premium option. It is not a feature upgrade. For practical purposes at current AI rack densities, it is the only physically viable thermal solution. A Vera Rubin NVL72 rack draws on the order of 220 to 230 kilowatts at maximum performance, depending on power profile. NVIDIA's published specifications are preliminary and subject to revision, but the order of magnitude is not in dispute. Air cooling at that density is not an engineering challenge to be overcome with clever design. It is a physics problem with a known solution. The solution is liquid.

Every major OEM showcasing flagship AI infrastructure at GTC 2026 (for example, Supermicro, ASUS, Wiwynn, GIGABYTE) presented 100 percent liquid-cooled systems as standard. Not as options. As the baseline architecture. The liquid cooling circuit entering that rack is not a facilities system that happens to connect to the server. It is life support for the compute. The CDU, the piping, the manifolds, the leak detection, the fluid management system – these are as critical to AI DC performance as the power distribution is. They must be designed with the same rigour, the same redundancy logic, and the same understanding of what failure means for the workload.

A CDU fault in an AI data center is not a maintenance event. It is a compute outage. The D&C team that designs the liquid cooling system without understanding what it is cooling, and what happens when it fails, is designing in the dark.

Designing from the GPU Outward

NVIDIA has already published the answer to the question of how an AI data center should be designed. The Vera Rubin DSX AI Factory reference design is a publicly available blueprint for codesigned AI infrastructure. It specifies how compute, networking, storage, power, and cooling must be integrated to maximise performance and efficiency under continuous high-intensity workloads. The operative word is codesigned. Not designed in sequence. Not designed in separate workstreams that are integrated at commissioning. Designed together, from the GPU outward, as one system.

This inverts the traditional design sequence completely. The facility is not the starting point. The compute is. The rack layout, the liquid cooling topology, the power distribution architecture, the structural loading: all of these flow from what the GPU cluster requires, not from a generic data hall specification.

There is an additional dimension the D&C community must account for. AI workloads are not static. Agentic AI inference has a fundamentally different infrastructure profile from training workloads. Training runs are long, sustained, and predictable in their power draw. Agentic inference generates sharp, dynamic demand spikes. The facility must be designed with headroom for workload evolution, not just for the known load at commissioning. Designing to today's specified workload is already designing to the past.

In practical terms, this means revisiting diversity factors, protection settings, and energy storage assumptions with the AI workload profile in mind. Step‑load behaviour, ramp rates, and simultaneous operation of training and inference clusters must feed directly into UPS topology, generator sizing philosophy, and controls strategies. The design model needs to be built around the GPU cluster behaviour, not mapped onto it at the end.

Prefab and Off-Site Manufacturing (OSM)

The xAI Colossus example from Article 2 is instructive here for a specific reason. The construction speed was not achieved by cutting corners on engineering. It was achieved by doing more engineering earlier, under controlled factory conditions.

Prefabricated power modules, pre-assembled cooling distribution units, modular data hall structures – these are not cost-reduction tools. They are quality and precision tools. AI DC infrastructure components must be engineered to tolerances that field assembly under construction site conditions cannot reliably achieve. Factory manufacture and factory testing before delivery produces better outcomes for AI infrastructure than traditional site-built approaches.

This requires the D&C team to engage with the compute requirements earlier in the programme than traditional timelines allow. The design process must compress. The decisions must be made sooner. The collaboration between the facility team and the compute team must begin at concept stage, not at fit-out stage.

The Professional Mindset Change Required

The data center D&C community has the capability to learn this. The engineering disciplines involved (thermal management, power distribution, precision construction) are not foreign. The knowledge gap is in understanding what is being built to serve, not in the ability to execute once that understanding exists.

The AI DC owner and operator cannot bridge this gap alone. Telling a D&C team what the compute requires after the design brief is fixed is too late. The professional accountability runs both ways. Owners must bring the compute requirements into the design process from day one. D&C professionals must develop sufficient knowledge of the AI infrastructure they are building to ask the right questions before they are told.

Deep knowledge of the DGX system, the liquid cooling circuit, the NVLink 6 fabric topology, and the continuous full-load operating reality must now be prerequisites for the D&C leadership team on an AI DC project. Not optional background reading. Prerequisites. This is also the professional opportunity Article 2 points to from the C-level side. When C-level leadership looks for project directors and heads of development who understand the AI infrastructure they are building, the D&C professional who has invested in that knowledge is the one who gets that role. The knowledge boundary that the industry must recover is also the career boundary that individual professionals can choose to cross.

The facility that gets built without that knowledge will underserve the AI system it houses. In a market where compute performance is the product and token delivery is the revenue, underserving the AI system is a business failure, not just an engineering one.

The Compute Wall Should Be Bridged

I want to close this article with something personal.

I came from computer science. Before I ever set foot in a data center, I worked in IT. I understood compute before I understood facilities. When I moved into computer room management and then into the data center industry, I crossed the knowledge boundary in one direction: from the compute world into the facility world.

Recently I crossed it again, in the other direction. I sat the NVIDIA NCA-AIIO certification as a deliberate act of professional curiosity. After more than 30 years in this industry, I wanted to understand what is actually happening inside those racks. Not at a marketing level. Not at the level of product names and specification sheets. At the level of how GPU architecture works, how NVLink fabric operates, how workload scheduling behaves under inference load, and how the thermal and power characteristics of a continuous full-load GPU cluster translate into facility requirements.

What I found was the minicomputer. The same integrated discipline I had known at the start of my career, rebuilt at a scale that would have been incomprehensible then. Article 1 of this series was born from that moment of recognition. The NCA-AIIO is a structured starting point. The NVIDIA documentation (the DGX system guides, the Vera Rubin DSX reference design, the NVLink architecture papers) is publicly available. The knowledge boundary is not a wall. It is a threshold. The D&C professional who chooses to cross it does not need to become a GPU engineer. They need to understand the machine well enough to ask the right questions before the slab is poured.

If I could do it after 30 years in this industry, there is no career stage at which it is too late to start. The question is whether you will cross that threshold before the next AI DC project lands on your desk, or after.

Next: Article 4 – The Operations Workforce the AI DC Needs