AI-ERA DATA CENTERS IN SOUTHEAST ASIA - Part 2 of 6: Tech, Ops and Talent

datacenterprimerja
Feb 26
10 min read

Updated: Feb 27

Building AI‑Era Operations Teams

James Soh

If you already work in data center operations, AI is not just changing the equipment in your halls. It is changing what your job looks like, where you work, and how fast your skills can grow. For HR and recruiters, the same AI wave is forcing a shift from “fill headcount” to “build a critical‑infrastructure profession” in places that are often far from city centres. For senior leaders, this is no longer a back‑of‑house topic: whether these teams exist, where they sit, and how fast they can grow now determines which AI projects your organization can credibly win and run.

In Part 1, we looked at the speculative build dilemma: every new AI‑era facility in Southeast Asia is already making structural, battery, and fire‑safety commitments, whether or not the board has named them. In this Part 2, the question is simpler and closer to the ground: who will run these AI‑density, battery‑heavy sites day‑to‑day, and how do we build that capability in time.

1. What really changes in the work

From an operations point of view, AI‑era data centers feel different in three main ways.

More intense power behavior. AI training and large inference loads make power behaviour much sharper than in traditional enterprise IT. Racks that used to sit at 6–12 kW now run at 40–70 kW or more, and power can swing quickly as jobs ramp up or down. In tightly synchronised training runs, clusters can spike well above their normal levels. That extra stress lands on row‑level UPS modules, branch circuits, and protection settings, and it makes maintenance windows and change control more fragile.

Different cooling expectations. Cooling moves from “keep the air moving and separated” to “manage heat at the chip or rack.” Air‑only approaches hit their limits; liquid‑assisted cooling and other liquid systems are becoming normal in AI halls. For operations teams, that means learning to live with pumps, manifolds, leak detection, fluid quality checks, and closer day‑to‑day coordination with IT.

New battery and safety profiles. Instead of only centralized Valve‑Regulated Lead‑Acid (VRLA) battery rooms tucked away from white space, sites start to see Lithium Iron Phosphate (LFP) and other chemistries installed closer to the load, sometimes in‑hall, as part of Energy Storage Systems (ESS). These bring different off‑gas and fire‑behaviour risks, stricter placement rules, and more direct interaction between operations, safety officers, and local authorities.

We need to be confident operating facilities where power swings, cooling technology, and battery choices are more tightly coupled, and where AI tenants are watching how you handle them.

2. The AI operations stack: tools, playbooks, people

To keep things simple, think of AI‑era operations as three layers stacked on top of the facility:

Tools. Monitoring and control systems that pull together data from the Building Management System (BMS), Data Center Infrastructure Management (DCIM) platforms, GPU racks, servers, and battery systems into alarms and dashboards. In AI halls, having “ten separate tools” that do not talk to each other is a liability. Operations teams need integrated views and clear workflows.

Playbooks. Standard Operating Procedures (SOPs), Emergency Operating Procedures (EOPs), and Method of Procedure (MOP) documents that tell people what to do, when to escalate, and where the red lines are, specifically for high‑density AI halls, liquid cooling, and battery deployments. Generic “one SOP for all data halls” is no longer enough.

People. The technicians, engineers, supervisors, and managers who carry the pager, respond to alarms, and explain constraints to clients. Their skills, fatigue level, and authority to say “no” under pressure now matter as much as the hardware specification.

This series uses the same Infrastructure Stack model introduced in Part 1 and in Data Center Primer: utilities and facility MEP at the base, client IT and applications at the top. Operations teams live mainly in the facility layer but have to understand enough about the IT and application layers above to keep them stable.

Here is a tightened version of Section 3 that keeps the functions but dials down the “new role” feel and makes it clear these can be extensions of existing facilities roles.

3. New role shapes: how existing teams stretch for AI‑era ops

Most sites in Southeast Asia still run with familiar functions: facilities, IT/network, security. AI‑era, battery‑intensive campuses do not need a completely new org chart, but they do need clearer ownership for some responsibilities inside the facilities team.

AI hall owner (or critical environment lead – AI). This is usually a senior facilities engineer who takes explicit ownership of one or more AI halls, on top of their normal scope. They keep track of power density limits, cooling modes allowed, and any ESS in or around those halls, and they translate maintenance and change plans into clear, safe boundaries for GPU workloads. In practice, they are the person who tells an AI tenant, “we can safely do this by this date” or “we cannot do that without breaking fire or safety rules.”

Data center–side SRE and automation lead. Instead of a brand‑new team, this is often an engineer or manager who already understands the plant and is given time and mandate to own tooling and automation. Their job is to tie together alarms, telemetry, and workflows from BMS, DCIM, IT, and ESS so that incidents follow a tested pattern instead of turning into ad‑hoc chat groups, and to remove manual, error‑prone steps where automation can safely take over.

Safety and compliance partner in the room. Here the change is not headcount, but presence and authority. A safety or compliance specialist is embedded in change control and major design decisions to keep fuel loads, battery placement, and suppression choices inside what regulators and insurers will accept, and to make sure operations teams can realistically implement those choices on shift.

For current operations professionals, these are not exotic new titles. A strong facilities engineer can grow into an AI hall owner or automation lead through specialised upskilling, and many sites will treat the safety/compliance partner as an extended hat for people who already handle EHS and regulatory contact. General data center certifications and newer AI‑focused or vendor‑specific credentials are useful markers of “AI‑bilingual” talent, but they still need to be backed by live‑site experience.

In Data Center Primer book I emphasize that human life comes first and that operations staff must have the training and authority to act when safety is at stake. In AI halls and battery rooms, that authority needs to be explicit: someone must be able to say, “we will not add these racks here” or “we will not move this battery bank there,” even if it conflicts with a client’s timeline.

Many of Southeast Asia’s biggest AI campuses will not be in city centres. They are landing where power, land, and sometimes water are available at scale, in industrial parks outside capital cities, cross‑border zones serving nearby hubs, and new corridors near strong grid nodes.

For operations staff, that changes daily life:

Commutes are longer, and public transport may be limited.
There are fewer city‑centre amenities; a lot of your time is spent on campus.
On the positive side, these sites are often larger, more complex, and more central to the business, so the scope of work and promotion paths can be stronger.

For HR and site leaders, this is a design problem, not just a hiring headache. Practical levers include:

Transport: shuttle buses from key hubs, parking support, and shift patterns that respect travel time.
Accommodation: housing support or partnerships near the site, especially for round‑the‑clock roles or out‑of‑town hires.
On‑campus life: proper rest areas, decent food, training rooms, and quiet spaces so a 24×7 workforce can stay sharp.
Workplace environment: high‑density AI halls can be louder and more thermally intense than conventional environments, so industrial‑grade noise, thermal comfort, and fatigue management standards matter.

If these pieces are not thought through up front, AI campuses will struggle to attract and keep the very operators they need most, no matter how impressive the hardware or architecture looks.

5. Local talent vs importing from the capital

Where your operators come from affects whether they stay.

In Malaysia, early projects in Johor and nearby districts often drew heavily on fresh graduates and early‑career engineers from Kuala Lumpur. That can work in the short term, but it increases the risk of churn if people feel “posted out” and disconnected from their home base. Over time, building structured pipelines with local universities and polytechnics in Johor and the surrounding region gives operators who grew up there a reason to build long careers close to home.

In Thailand’s Eastern Economic Corridor and similar zones, data centers can recruit from Bangkok, but those who build direct relationships with institutes and universities in provinces like Chonburi and Rayong are more likely to see better retention and deeper local knowledge.

For HR and site leaders, “AI‑era talent strategy” in these locations is not just about competing in capital‑city job fairs. It is about drawing on local institutes so that the people who run your remote AI campuses want to build a long career there. For operators from those regions, these campuses are not exile postings; they are where some of the most interesting and consequential work in the region will happen.

6. Phasing talent: import the nucleus, Regional + Local to kick-start Operations

A practical staffing model for new AI campuses is to treat people development in phases. "Relocal" is a play on the buzzword "Glocal", which is regional + local. Please mention I recreated this Relocal term if you use it too ;0

Phase 1 – Relocal: Import the experienced nucleus from regional DCs. In the first 12–24 months, it makes sense to second a small group of experienced leads and supervisors from established hubs to stabilise operations. They help with commissioning, write local SOPs and EOPs, and mentor the first cohort.

Phase 2 – Build the local core. In parallel, hire junior and mid‑level staff from local institutes and universities, and put them on a clear training path: orientation, shadowing, supervised tasks, certifications, then rotation into critical roles.

Phase 3 – Go full local. As the site matures, the goal is for the local team to own day‑to‑day operations. Imported leaders shift into regional roles, project work, or mentoring. Over time, each campus becomes a training and export base for future AI sites, not a permanent importer of talent.

This phased approach mirrors the project and commissioning logic in Data Center Primer: use experienced leadership to de‑risk early phases, but design for local teams to own steady‑state operations.

7. Operations and HR on the same side

For AI‑era facilities, operations and HR cannot stay in separate lanes.

Operations leaders know what the site really demands: skill mix, shift coverage, fatigue risks, and the reality of the location. HR and talent teams know where candidates are, what they expect, and which combinations of pay, benefits, commute support, and growth will make roles attractive and sustainable.

On a successful AI campus launch:

Operations leaders help write job descriptions, criteria, and interview questions so hires match real work, not a generic “data center operator” template.
HR is involved in roster design, shuttle and housing decisions, on‑site amenities, and the phased “import nucleus → build local core → flip the model” approach, instead of treating those as purely technical choices.

If operations and HR do not sit at the same table from planning to steady‑state, an AI campus will struggle, not because the hardware is wrong, but because the people strategy never caught up.

8. Helping new hires see the work, and how automation fits

Recruitment for AI‑era operations roles should do more than list responsibilities and shift patterns.

When hiring fresh graduates or people crossing over from IT, operations leaders can:

Join campus talks and interviews personally, so candidates see and hear from the people actually doing the work.
Show short videos of a real shift: the control room, a walk through an AI hall, or a technician handling a planned change.
Explain, in plain terms, how AI and automation are changing tools and workflows, and where people still matter.

Many potential hires worry that “AI will take my job.” Being honest helps. In most AI‑era facilities, the goal is to push as much as possible into full automation: instant fault responses, fast load‑shedding, and automatic cooling adjustments are things software can usually do faster and more consistently than a tired engineer. That is the right instinct.

The real risk is over‑trusting automation, assuming that because the logic worked in one scenario, it will handle every edge case. In practice, the best‑run facilities let automation act first for speed but keep people in charge of the rules. Operations teams decide which actions are allowed, test them under realistic conditions, and know when to pause or override an automated response if something does not look right.

For those who want to prepare, giving them pointers to structured resources such as Data Center Primer can help them arrive with a basic understanding of the facility and the work, instead of starting cold.

The side effect of this transparency is that AI clients and senior leaders also gain confidence. When they see a site investing this much in explaining and growing its operations staff, they know they are not just renting megawatts; they are partnering with a team that will still be there, and still be competent, five to ten years from now.

9. How AI tenants and senior leaders see all this

For AI data center clients, operations quality is no longer a “back‑of‑house” topic. When they evaluate a site, they ask:

Computational fidelity: can this team keep my GPUs from thermally throttling because of cooling mismanagement or unsafe changes?
Safety and competence: has this team actually run high‑density, battery‑heavy halls before, and are they trained on liquid cooling, ESS, and local fire‑safety rules?
Authority: what evidence is there that they can say “no” to unsafe requests, even mine?

For CEOs, COOs, and other senior leaders, operations capability has become part of the product, not just a cost line. A remote AI campus marketed as “AI‑ready” but staffed and trained as if it were a 10 kW enterprise site will be treated, and priced, as legacy capacity over time.

For current operations professionals, the message is more positive. The facilities and teams that master AI‑era operations will be the ones AI tenants ask for by name. Operators, supervisors, and managers who build those skills will have more, not fewer, career options in the years ahead.

What comes next

AI‑era facilities do not just change the work on shift. They also change what boards, project teams, and clients expect from a site. Later parts in this series will look at how AI tenants' needs affect your facility design and investment decisions about power, cooling, and batteries shape which workloads you can credibly host.

AI-ERA DATA CENTERS IN SOUTHEAST ASIA - Part 2 of 6: Tech, Ops and Talent

Recent Posts

Comments