Root cause, not the requested fix: the most underrated engineering skill

“We need X” is almost always a solution wearing the costume of a problem. By the time a request reaches an engineering team, someone has already done a silent diagnosis and handed over their conclusion. That conclusion is usually part right and part the first idea that felt actionable in a stressful meeting. The skill that separates senior engineers — and defines Forward Deployed Engineers — is the discipline to hear the request, respect it, and still go find the problem underneath.

Reframe everything

A request is a hypothesis, not a spec. Treat it as the start of the conversation, not the end of it. The faster you can restate what someone actually wants — in terms of an outcome, not a feature — the less likely you are to build the wrong thing beautifully.

Example: the “let’s move to microservices” trap

The requested fix vs. the real cause underneath.

A classic. The monolith is “too slow,” so the org commits to breaking it into microservices. It is a real architecture, and sometimes it is the right call — but notice that “too slow” is a symptom and “microservices” is a solution, with no diagnosis in between.

Bad: build the brief

The team spends nine months splitting the monolith into services. Deploys get more complex, on-call gets worse, latency is unchanged — and the actual cause, one unindexed query and a chatty N+1, was never even measured.

Good: find the cause first

A root-causing engineer profiles the system in an afternoon, finds the query, adds an index and a small cache, and ships a 6× speedup in a week. Now the microservices conversation can happen for the right reasons — scaling teams, not chasing a latency number that had a one-line fix.

A week of measurement often beats nine months of rebuilding.

The field is where the real problem hides

You cannot root-cause a business problem from a conference room. The truth is on the floor, in the hands of the people doing the actual work — and especially in their workarounds. The spreadsheet someone keeps on the side, the manual step nobody documented, the “we just always do it this way” — those are a map straight to the real problem.

Good: go to the floor

The team reports the order system is “broken.” On site, the engineer watches a clerk re-key every order into a second system because of a missing integration. The “bug” was a process gap. A small integration removed an hour of double-entry per person, per day — and the “bugs” disappeared with it.

Bad: solve it from the ticket

From the ticket alone, the team “fixes” a validation error, closes it, and ships. They never learn that the validation error only existed because of the manual double-entry nobody mentioned — so it comes back next week under a new ticket number.

How to actually do it

Root-causing is not a personality trait you either have or you don’t; it is a repeatable habit. The engineers who are great at it tend to run roughly the same play:

Ask “what happens if we do nothing?” — it reveals the real stakes, and sometimes that there were none.
Trace the problem to a number the business already watches, so success is defined before you build.
Watch one real user do the real task, end to end, without coaching them.
Hunt for the workaround — the side spreadsheet, the manual step, the shadow process. The workaround is the diagnosis.
Separate the symptom from the cause out loud, before proposing a single solution.

Why this is so hard to hire for

Root-causing in the field requires three things at once: the confidence to push back on a paying stakeholder, the technical depth to actually run the diagnosis, and the people skills to do it without making anyone feel stupid. That is a senior instinct, and it is the same rare profile companies are chasing when they post for a Forward Deployed Engineer — someone who led teams, kept their technical edge, and learned the hard way that the brief is just the beginning.

Junior engineers solve the problem they’re given. Senior engineers solve the problem you actually have.

It is also the cheapest insurance you can buy. The most expensive failure in software is not a bug — it is building the wrong thing well, on time and on budget. Root-causing is how you avoid it.

← Back to all articles

Root cause, not the requested fix: the most underrated engineering skill

Example: the “let’s move to microservices” trap

The field is where the real problem hides

How to actually do it

Why this is so hard to hire for

Keep reading

88% of AI agent pilots die. Here's what the survivors do differently.

AI checkout moved back to your website — and those shoppers convert nearly 50% better

Agents don't buy seats: why $234B of software spend is about to be repriced

Let's build something that works — across the whole stack.