Why UPS Systems Require Annual Thermal Inspections

The system designed to prevent downtime is itself one of the most common causes of it. Here is why uninterruptible power supplies fail silently, what thermal imaging reveals, and why annual UPS thermography is non-negotiable for any mission-critical facility.

There is a particular kind of failure that haunts every facilities manager running a mission-critical site. It is not the failure of a piece of production equipment, a server, or a piece of process plant. It is the failure of the system specifically designed to prevent failure.

When the mains supply drops, the UPS is supposed to take over instantly. The protected load never notices. The generator starts. Continuity is preserved. That is the entire purpose of the uninterruptible power supply.

Except, sometimes, it does not. The UPS that has been quietly sitting in the corner for five years, batteries fully charged, status LEDs green, ready for the moment when it would be needed, fails at exactly the moment when it was needed. The load drops. The data centre goes dark. The hospital theatres switch to handheld torches while the generator starts. The financial trading floor disconnects mid-transaction.

Industry data is uncomfortable. According to recent UPS maintenance research, battery-related failures account for over 98% of UPS failures during the equipment wear-out phase. And UPS battery failures, like most electrical degradation, generate heat long before they cause functional failure. That heat is detectable with infrared thermography, weeks or months before the UPS would fail under load.

This article explains why UPS systems require annual thermal inspections, what thermography detects that other maintenance methods miss, and how a structured thermal inspection programme protects the system that protects everything else.

Why UPS Systems Are Particularly Vulnerable to Thermal Faults

A UPS is not a single component. It is a system that combines high-current electrical connections, power conversion electronics, energy storage, and active cooling, often running 24/7 for years between major service events. Every one of these subsystems is vulnerable to thermal degradation, and several of them are extremely difficult to inspect by any method other than thermography.

Battery Banks: The 98% Failure Mode

Valve-regulated lead-acid (VRLA) batteries are the dominant chemistry in mid-range UPS installations. They are typically rated for a service life of 10 years at an ambient temperature of 25°C. The problem is the temperature relationship: for every 8 to 10°C above 25°C, battery life is approximately halved. A battery cabinet running at 35°C has half the expected service life. At 45°C, it has a quarter.

Even within a healthy cabinet, individual cells can develop thermal anomalies long before they affect runtime. Internal cell shorts, increased internal resistance, sulfation, and case bulging all generate heat that is detectable externally. Thermal imaging across an entire battery bank reveals which cells are running hotter than their neighbours, identifying the failing units before they trigger a cascade through the string.

Battery Terminal Connections

Battery terminals carry high DC current and are subject to vibration, thermal cycling, and corrosion. The same mechanisms that loosen busbar connections on AC distribution apply to battery posts. A loose battery terminal increases resistance, generates heat at the joint, accelerates corrosion of the lead post, and progressively reduces the effective current capacity of the entire string.

A loose battery terminal is not detectable by visual inspection until late-stage corrosion appears. It is not detectable by simple voltage measurement under no-load conditions. It is clearly visible under thermal imaging during normal float charging.

Power Conversion Components

The rectifier, inverter, and bypass circuits inside a UPS are populated with high-power semiconductors (IGBTs, diodes, thyristors), large capacitors, and inductors. Each of these components has a thermal failure signature. DC Group's UPS troubleshooting guide identifies capacitor failure as a leading cause of UPS instability, with the failure mode preceded by elevated internal temperatures. Thermal imaging through ventilation openings or with the cabinet open during scheduled maintenance reveals which capacitors are running above expected temperatures.

Cooling Fans and Airflow

UPS systems rely on forced-air cooling to remove the heat generated by power conversion losses. Every cooling fan inside the UPS has a finite service life. As fans degrade, airflow drops, internal temperatures rise, and the rate of degradation of other components accelerates. Thermal imaging of the UPS exhaust pattern reveals which fans are working correctly and which are not. A fan that has stopped is one of the easiest faults to detect with infrared, and one of the most consequential to ignore.

Bypass Switches and Static Transfer Switches

The mechanisms that connect the UPS to the load and the bypass to the load are critical. Static transfer switches in particular handle the entire UPS output during normal operation. A degrading STS develops elevated temperature long before it fails functionally. Thermography is the only practical way to detect this degradation without disconnecting the UPS from the load.

The pattern: UPS subsystems fail through thermal mechanisms. The system is designed for continuous operation, which means every degradation mechanism that responds to heat (battery ageing, capacitor failure, fan wear, connection loosening) is constantly active. Annual thermal inspections do not replace electronic diagnostic testing or load testing. They add the dimension that those methods cannot provide: visibility of heat-driven degradation in real time, under real operating conditions.

Why UPS Systems Are Particularly Vulnerable to Thermal Faults
A UPS is not a single component. It is a system that combines high-current electrical connections, power conversion electronics, energy storage, and active cooling, often running 24/7 for years between major service events. Every one of these subsystems is vulnerable to thermal degradation, and several of them are extremely difficult to inspect by any method other than thermography.
Battery Banks: The 98% Failure Mode
Valve-regulated lead-acid (VRLA) batteries are the dominant chemistry in mid-range UPS installations. They are typically rated for a service life of 10 years at an ambient temperature of 25°C. The problem is the temperature relationship: for every 8 to 10°C above 25°C, battery life is approximately halved. A battery cabinet running at 35°C has half the expected service life. At 45°C, it has a quarter.
Even within a healthy cabinet, individual cells can develop thermal anomalies long before they affect runtime.
Try SnapCor Free for 14 days

What a UPS Thermal Inspection Actually Detects

A comprehensive UPS thermographic inspection covers six distinct categories of finding, each of which represents a failure mechanism that other inspection methods may miss.

1. Individual Cell Thermal Anomalies

Scanning across a battery bank under float charge reveals temperature variations between cells. A cell running 5 to 10°C above its neighbours is showing internal resistance change, electrolyte loss, or early thermal runaway. The thermographer can identify the specific cell, document the temperature differential, and trigger a remedial action (load test, replacement, or in extreme cases, immediate isolation).

2. String-Level Imbalance

In multi-string installations, comparing thermal signatures across strings reveals imbalanced charging or load distribution. A string running consistently hotter than its parallel strings is often a sign of failing cells dragging the others down.

3. Battery Terminal and Inter-Cell Connection Heating

Loose, corroded, or undersized connections between cells, between rows, and at the string terminations generate localised heating that is invisible to visual inspection. Even a single loose inter-cell link can increase the resistance of the entire string and accelerate the failure of all the cells it connects.

4. Power Module Heating

Rectifier and inverter modules generate predictable heat patterns under normal load. Deviation from those patterns indicates component degradation, cooling issues, or load imbalance. For modular UPS systems, comparing thermal signatures across identical modules reveals which one is approaching the end of its service life.

5. Cooling System Performance

Fan operation, vent obstruction, and overall cabinet thermal balance are all directly visible through thermography. An exhaust pattern that has changed since the previous inspection often indicates a fan running below speed or partial blockage of the cooling path.

6. Output and Bypass Connections

The same load correction and BS7671 fault grading that applies to general electrical thermography applies here. UPS output busbars, bypass connections, and downstream distribution are all inspected and graded against reference temperatures.

The Cost of UPS Failure

The financial and operational consequences of UPS failure are well documented across multiple industries.

  • Data centre outage cost: Average unplanned outage cost exceeds $500,000 per incident according to Ponemon Institute research. For larger facilities, single incidents have exceeded $1 million in direct costs.
  • Outage causation: The Uptime Institute's 2025 Annual Outage Analysis attributed 54% of major impact data centre outages in 2024 to energy and power failures.
  • SLA exposure: Most colocation and managed service contracts include uptime guarantees backed by financial penalties. A UPS failure that drops the protected load can trigger significant SLA payouts to multiple downstream clients simultaneously.
  • Equipment damage: Beyond the immediate downtime, sudden power loss can corrupt storage arrays, damage spinning hard drives, and cause hard shutdowns that require lengthy recovery procedures.
  • Reputational cost: For colocation operators, hospitals, and financial services, a single high-profile UPS failure can damage client relationships and competitive positioning for years.

Against these costs, the cost of an annual UPS thermal inspection is negligible. The economic case is overwhelming.

The Compliance Case for Annual UPS Thermography

Beyond the engineering and economic cases, there is now a regulatory case. NFPA 70B 2023 mandates annual infrared inspection of all electrical equipment, with UPS systems explicitly included. ISO 18436-7 covers condition monitoring of machines including thermography requirements. UK insurers increasingly require evidence of regular thermographic inspection across critical electrical infrastructure as a condition of business interruption cover.

For data centre operators in particular, Uptime Institute tier certifications, ISO 22301 business continuity certification, and major hyperscale colocation contracts all require documented thermal inspection programmes covering UPS systems. The cost of compliance is small. The cost of being unable to demonstrate it after an incident is significant. See our article on why annual thermal trending prevents catastrophic failure for the wider compliance framework.

How a UPS Thermal Inspection Should Be Conducted

Effective UPS thermography follows the same principles as general electrical thermography, with some UPS-specific considerations.

1. Inspect Under Representative Load

The UPS should be carrying its normal protected load during the inspection. A UPS sitting on bypass or running at idle does not reveal the full thermal picture. Scheduling inspections during normal operating hours, rather than during quiet maintenance windows, gives the most accurate results.

2. Include the Full Battery Bank

Every cell, every inter-cell connection, every string termination, and every rack-level connection should be imaged. For installations with many strings or many cells per string, this is time-consuming, but it is the only way to identify cell-level anomalies before they affect string performance.

3. Capture the Power Module Pattern

Through ventilation openings or with the cabinet open during scheduled maintenance, image the rectifier, inverter, and bypass sections. Document the thermal pattern in a way that can be compared against prior inspections.

4. Image the Exhaust

Front and rear thermal images of the UPS cabinet exhaust reveal fan performance and overall cooling balance. Changes in exhaust pattern between inspections are an early indicator of cooling system degradation.

5. Apply Load Correction and Trending

Apply BS7671 load correction to all current-carrying connections. Compare current findings against prior inspections for the same UPS using SnapCor's trending engine. A cell that has risen 8°C since last year deserves attention regardless of whether it is currently within absolute limits.

6. Generate the Report On Site

UPS findings are time-sensitive. Generate the inspection report immediately, communicate Critical findings verbally to the site team before leaving, and deliver the full PDF the same day. SnapCor generates the report on site in under 60 seconds so the maintenance team can begin planning remedial work before you have packed the camera away.

Frequently Asked Questions

How often should a UPS be thermally inspected?

Annually at minimum, in line with NFPA 70B 2023. For Tier III and Tier IV data centres, healthcare critical care environments, and high-value process facilities, six-monthly inspections are recommended.

Can a UPS thermal inspection be conducted without taking the UPS offline?

Yes. UPS thermography is conducted while the UPS is running and carrying its protected load. The inspection is non-contact and non-invasive. See our article on live thermal inspections without shutdown for the full methodology.

How long does a UPS thermal inspection take?

It depends on the size. A single small UPS with a 32-cell battery bank typically takes 30 to 45 minutes including report generation. A large multi-module UPS with several hundred cells across multiple strings can take several hours. Modular UPS systems with parallel modules require additional time to image each module individually.

What is the most common UPS thermography finding?

Loose battery terminal connections and individual cell thermal anomalies are the most common findings, followed by cooling fan degradation and capacitor heating in the power conversion stages. None of these are detectable by visual inspection or by simple voltage testing under no-load conditions.

Should the UPS service contract include thermal inspection?

Most factory UPS service contracts include functional and load testing but do not include comprehensive thermal inspection of the battery bank and connections. Treat thermography as a complementary activity, conducted alongside the OEM service visit or by a specialist thermographic inspection company. The two should be coordinated rather than duplicated.

Does SnapCor support UPS inspections specifically?

Yes. SnapCor's electrical template covers UPS systems including battery bank inspections, with load correction, fault grading, AI-assisted remedial recommendations, and trending across periodic inspections. See the first inspection walkthrough for the full process and the SnapCor YouTube channel for demo content.

Protect the System That Protects Everything Else

The UPS is the line between business continuity and business interruption. It is also one of the most vulnerable single points of failure in a mission-critical electrical infrastructure. Battery banks degrade in heat. Connections loosen under thermal cycling. Cooling fans wear out. Capacitors fail. Every one of these failure modes generates heat before it causes downtime.

Annual UPS thermography is the cheapest, fastest, and most reliable way to catch every one of these failure modes in time to act.

Try SnapCor Free for 14 Days  >>

For UK enterprise UPS thermography services across data centres, hospitals, and commercial buildings, contact the TI Thermal Imaging team. For UAE and GCC enquiries, see Thermal Imaging UAE. New to SnapCor? Start with the installation guide.

SnapCor generates thermal inspection reports on site in 60 seconds

Thermal inspection reports. Done on site. In under 60 seconds.
ISO 18436-7 aligned, BS7671 load correction built in, auto fault grading, works with any thermal camera.
Try it free for 14 days → https://snapcor.app/pages/pricing-plans


SnapCor thermal inspection reporting platform showing ISO 18436-7 aligned reports with BS7671 load correction auto fault grading and compatibility with any thermal camera generated on site in under 60 seconds
SnapCor generates thermal inspection reports on site in 60 seconds

SnapCor is a thermographic inspection reporting platform built by TI Thermal Imaging. Reports are aligned to ISO 18436-7 and informed by BS7671 reference temperatures. Statistics cited in this article are sourced from the Uptime Institute Annual Outage Analysis (2025), Ponemon Institute research, and industry UPS maintenance research. UPS thermographic inspections must be conducted by qualified thermographers under appropriate safe systems of work, in line with NFPA 70E and site-specific electrical safety procedures.

Back to blog