AI DC – Renaissance and New Thinking Required. Article 1 of 5

datacenterprimerja
Mar 31
5 min read

Article 1: The Machine Was the Building

This article speaks to all three audiences: C-level leadership, design and construction specialists, and the operations workforce.

Before there were data centers, there was the machine.

In the 1960s and 1970s, you did not build a facility and then decide what to put in it. You acquired the machine and built everything around it. The IBM System/360, the DEC VAX, the Cray supercomputer. The room existed to serve the computer. The power, the cooling, the raised floor, the controlled access — all of it was subordinate to one purpose. Making the machine run.

I know this not from reading about it. I started my career as a minicomputer systems administrator on a DECVax cluster. My job was to know the machine. Not just the operating system. The scheduler. The batch queues. The memory architecture. The interconnects between nodes. The way the system behaved when a long training job competed with interactive user sessions for the same resources. You managed the whole thing as one integrated system because it was one integrated system.

That discipline shaped how I thought about computing for the rest of my career.

The Cray Was Not Wrong

In 1976, the Cray-1 used freon cooling. In 1985, the Cray-2 used fluorinert liquid immersion. Seymour Cray was not making an exotic engineering choice. He was solving a physics problem. At the power densities those machines operated at, air cooling was simply not adequate. Liquid was the only answer.

Nobody questioned it. There was no alternative.

The machines of that era ran hot and ran hard. Workloads were managed deliberately. Batch jobs queued for long compute runs. Interactive sessions served users who needed near-immediate response. The scheduler was not a background utility. It was a first-class component of the system. Wasting cycles on expensive iron was a professional failure. Every idle CPU was money left on the table and every operator knew it.

The data that those machines processed was the basic unit of work. Today we call it the token. Every prompt, every reasoning step, every inference call generates tokens. The token is to the AI era what the batch job was to the minicomputer era. The unit of intelligence. The thing the machine exists to process.

The Great Disaggregation

Then everything changed.

In the late 1980s and through the 1990s, cheap commodity x86 processors networked together began to displace the minicomputer. Not because they were better machines. Because they were good enough machines at a fraction of the cost. The VAX cluster gave way to Novell NetWare, then Windows NT, then Linux on white-box servers from Dell and HP.

The data center became a room full of identical boxes. Compute became generic. Power densities dropped low enough that liquid cooling became unnecessary. Air was sufficient. The specialist operator who knew the machine became a generalist sysadmin who managed a fleet.

This was not a failure. It was a rational and productive evolution. The disaggregation era democratised computing, built the internet, and created the software industry as we know it. It was one of the most consequential technology shifts in history.

But it came with a structural consequence that took decades to fully manifest.

It separated two things that had always been one. The computer and the building that housed it became two separate professional domains, two separate industries, two separate communities with different vocabularies, different priorities, and different career paths. The people who built and operated facilities no longer needed to understand the compute. The people who ran the compute no longer needed to understand the facility. Each community optimised for its own domain.

That separation made complete sense in the commodity era. It became a liability in the era that followed.

The Cloud Completes the Abstraction

Virtualisation in the 2000s took the disaggregation one step further. The physical server became irrelevant to the workload. AWS, Azure, and GCP completed the transformation. Hyperscale data centers became compute factories producing undifferentiated virtual machines. The data center industry industrialised around this model. Tier classifications, PUE metrics, white space per square metre, power capacity per megawatt.

The tenant was assumed to be running commodity x86 workloads. Design was standardised. The occupant of the data hall became an abstraction. A number in a capacity plan. A kilowatt figure in a lease agreement.

For two decades this worked. The assumptions held. The model was profitable. An entire generation of professionals built careers on it.

The Pendulum

Then came the GPU cluster.

NVIDIA's DGX systems, the NVLink and NVSwitch fabric, the Blackwell architecture, and now Vera Rubin. A single Vera Rubin NVL72 rack draws up to 227 kilowatts and is 100 percent liquid cooled. Not as an option. As the only viable thermal solution at that power density.

Seymour Cray was not wrong. The industry simply spent 30 years at power densities low enough to forget why he made the choices he did.

The GPU cluster is a computer again. Not a collection of servers. A singular, tightly integrated system where individual GPUs are meaningless in isolation. The rack is the unit of integration. NVLink and NVSwitch create a fabric that makes the whole system behave as one machine. The workloads are familiar to anyone who ran a VAX cluster. Long batch training jobs. Near-real-time inference serving users. The same batch and interactive duality, at a scale that would have been incomprehensible in 1985.

The AI data center is not a new thing. It is the original thing, rebuilt with 40 years of semiconductor progress underneath it.

And on the horizon, the stakes are climbing further still. Agentic AI systems that spawn sub-agents and execute complex multi-step workflows autonomously. Physical AI in robotics and industrial automation acting in the real world in real time. The compute demand these represent makes today's training workloads look modest.

What This Series Is About

I have spent more than 20 years in the data center industry on the developer and operator side across Asia-Pacific and the Middle East. I recently began studying for the NVIDIA NCA-AIIO certification. Sitting with that material, I kept finding the minicomputer. The scheduler logic. The batch and interactive duality. The integrated stack. The specialist operator. The machine as the centre of everything.

The industry that grew up during the disaggregation era — the C-level leaders, the design and construction professionals, the operations workforce — built its knowledge, its assumptions, and its business models on a world where compute was generic and the facility was the product.

That world has changed greatly.

This series is for the people who need to understand why, and what they must do differently. It addresses three audiences directly: C-level leadership, design and construction specialists, and operational leaders and the operations workforce. Each article will flag which audience it speaks to most directly, but the argument runs through all three.

The machine is the building again. The question is whether the people who run this industry understand the machine.

Next: Article 2 — The Blind Spot the Industry Built Into Itself

AI DC – Renaissance and New Thinking Required. Article 1 of 5

Article 1: The Machine Was the Building

Recent Posts

Comments