Intermittent Faults
The hardest faults to find — works when you're there, fails when you leave. Thermal, vibration, humidity-triggered failures. Capturing data during the failure.
What you'll take away
- ▸ Recognize the patterns of thermal, vibration, humidity, and voltage-dependent intermittent faults
- ▸ Use data-capture techniques to catch a fault you can't reproduce on demand
- ▸ Employ controlled tests to accelerate appearance of thermal or vibration-related faults
- ▸ Know when to stop troubleshooting and recommend a planned replacement vs. chasing ghosts
Intermittent faults are the callbacks from hell. The system works perfectly when you’re on site, then fails an hour after you leave. The homeowner is frustrated because “the last guy couldn’t fix it either.” You can’t diagnose what you can’t observe, and the obvious approach — replace something plausible and hope — is exactly how multiple visits to the same house accumulate.
There’s a discipline for intermittent fault diagnosis. It involves understanding the mechanisms that cause intermittent behavior, using data capture to observe the fault when you’re not there, and accepting that sometimes the honest answer is “I can’t definitively diagnose this from a working system — here are the most likely candidates, here’s what I’ll do if it recurs.”
What makes a fault intermittent
Most intermittent faults have a physical trigger — a condition that has to be present for the fault to appear. Learning to identify the trigger is half the battle.
Thermal expansion and contraction. A loose connection that’s tight when cold can become separated when parts expand. Or a soldered joint that’s good at ambient fails when the equipment reaches operating temperature. Furnace internal connections that fail after 30 minutes of operation are often thermal.
Vibration. Compressors, blowers, and inducers vibrate. A wire that’s intermittently touching or separating due to vibration presents as a fault that clears when the equipment is stopped. Loose terminal screws and marginal push-on connections are the classic culprits.
Humidity. Moisture-ingress faults happen in summer when humidity is high or after extended shutdown in damp basements. Circuit boards with conformal coating damage, connectors with corrosion, or grounding paths affected by moisture all show this pattern.
Voltage variation. Brown-outs in summer, surges during storms, or chronic low voltage can cause equipment to fail that’s fine with clean stable power. A transformer running on 108V primary produces 22V secondary instead of 24V — just enough to make marginal relays misbehave.
Loading. A contactor that holds fine on short runs fails when the compressor has run for 20 minutes and contact temperature has risen. Heat dependency on what was a marginal contact.
Intermittent fault patterns
reference| Fails after N minutes of running | Likely thermal | Wait for failure, measure while hot |
| Fails during windy weather | Possibly flue-related | Wind affects draft; pressure switch chattering |
| Fails after hot summer days | Likely thermal + humidity | Summer conditions stress aging electronics |
| Fails shortly after power interruption | Possibly capacitor or board | Inrush on restart stresses aging components |
| Fails 'at 3am' or consistently at night | Possibly voltage-related | Grid loading changes overnight; brown-outs |
| Fails after unit has been running then idling | Thermal contraction on cool-down | Connections that separate when cooling |
Data capture techniques
If you can’t reproduce the fault during your visit, capture data while you’re not there.
Data-logging DMM. Some meters (Fluke 289, others) can log voltage, current, or resistance over time. Clip onto a suspect test point and leave the meter running for a few days. When you return, examine the log for anomalies correlated with when the homeowner reported a failure.
Clamp ammeter with peak-hold. Leave a clamp ammeter on the compressor line. Any unusual spikes or dropouts get recorded by the peak-hold feature.
Event log on the control board. Many modern furnaces and heat pumps have internal event logs with timestamps. Pull the log using the service mode; it often shows the moment of failure with an error code you didn’t witness.
Homeowner observation. Ask the homeowner to take specific notes when the failure occurs: exactly what time, what had the system been doing in the preceding hour, any audible sounds, any smells, what the thermostat display said. A homeowner who actively observes and documents can tell you an enormous amount.
Strip thermometer stickers. Available at electronics suppliers. Stick them on suspect components; they record peak temperature between visits. If a capacitor’s peak temperature is hitting 180°F during your absence, that’s data you wouldn’t have otherwise.
Accelerated testing
Sometimes you can provoke the failure during your visit by stressing the suspect condition.
Thermal. Use a heat gun (carefully) to warm a suspect connection or component. If it was a thermal fault, applying heat should trigger the symptom. Conversely, freeze spray can shock a component to find the opposite thermal fault.
Vibration. Gently tap and flex suspect wire harnesses, connector bodies, and component mounts while the system runs. An intermittent connection that’s sensitive to movement will often reveal itself immediately.
Voltage. A variac (variable voltage supply) lets you intentionally deliver low or high voltage to a component. If the fault appears at 108V but not 120V, you’ve found a voltage-dependent condition and probably a marginal component.
When to stop chasing
Intermittent faults can eat unlimited time. At some point, you have to decide whether continuing to chase the fault is the right use of everyone’s money, or whether a strategic replacement is warranted.
Signals that strategic replacement may be appropriate:
- Multiple visits without resolution
- Component is 15+ years old and has other signs of aging
- The fault pattern (e.g., thermal, summer-only) points strongly at an aging-related failure even if you can’t definitively localize it
- Continued no-heat or no-cool episodes are creating significant hardship for the homeowner
When recommending strategic replacement, be honest: “I haven’t definitively identified the fault, but the most likely candidate is the control board based on the pattern. A new board is $400 and resolves it in most cases with this symptom. If it doesn’t, we’ll refund the board and keep investigating.” This is more professional than pretending you diagnosed something you didn’t.
From the field
Homeowner reported heat pump “randomly going to aux heat” in winter — sometimes days of normal operation, then a stretch where aux heat ran continuously for hours. Three prior visits by different techs had not identified the fault; each visit had the system running normally.
I pulled the defrost board’s event log on my visit. It showed the aux heat relay energizing without an accompanying defrost cycle — something outside the normal SOO. The relay was being commanded directly from the W2 thermostat input, which was weird because the thermostat should only be calling W2 when the heat pump couldn’t keep up.
Left a data-logging DMM recording 24V on the W2 line at the thermostat. Came back three days later. Log showed W2 randomly asserting at unexpected times — sometimes for minutes, sometimes for hours, with no obvious trigger.
The thermostat was a Nest, two years old, connected via power stealing (no C wire). The Nest’s power-stealing strategy was occasionally bleeding enough current through the W2 circuit to falsely activate the aux heat relay. Installed a C-wire adapter at the air handler, problem gone immediately. Had I not captured the pattern at the thermostat end, I would’ve chased heat pump system issues for weeks.
Check your understanding
0 / 301A furnace locks out every 2-3 days during cold snaps but runs fine most of the time. The homeowner has tolerated this for a month. What's your approach?
02You suspect a thermal-dependent intermittent fault at a terminal on the ignition module. What's a controlled test you can run during your visit?
03When is 'strategic replacement' an honest response to an intermittent fault?
Intermittent faults are where diagnostic discipline shows its value. Pattern recognition, data capture, controlled provocation, and honest communication with customers all become important. When you can’t reproduce the fault, you have to outsmart it — and that starts with the mindset that the fault is a predictable physical phenomenon waiting to be observed, not a ghost.