Fault Attribution: The Missing Commercial Capability

Here is a conversation that happens in every Regional MD's office at least once a quarter.

RMD: Indonesia missed target again. What's going on?

Commercial Director: It's a distributor problem. Our main distributor in East Java has been underperforming for months.

Sales Manager: I wouldn't say it's the distributor exactly. It's coverage. We're not hitting enough outlets in the high-growth corridors.

Marketing lead: I think we have a pricing problem. Our ASP is too high for the mid-market segment where the growth is.

Operations lead: The issue is DSR productivity. Our field team metrics have been deteriorating since we restructured the routes.

Everyone has a theory. Each theory is plausible. Each theory implies a different intervention. The RMD has no way to adjudicate between them, because the commercial stack - the CRM, the BI dashboards, the consulting reports - measures outcomes, not causes. They can show that Indonesia missed target. They cannot tell you why.

This is the fundamental gap that defines the commercial execution category. No CRM currently closes it. No BI tool currently closes it. No consulting engagement closes it durably - the engagement might produce a point-in-time diagnosis, but the diagnosis decays within months as the market moves. What's missing is a system that continuously attributes commercial outcomes to their specific operational causes, at a granularity that would let the RMD distinguish between the four competing theories above.

I call this capability fault attribution, borrowing the term from systems engineering where it has a precise meaning. In complex systems - power grids, distributed software, mechanical assemblies - fault attribution is the engineering discipline of routing an observed symptom to the specific component or event that caused it. When a grid experiences a voltage drop, fault attribution tells the operator which transformer failed, what time it failed, and what cascade of events followed. This is how complex systems are managed at scale.

Commercial systems are complex in the same way. A missed target in Indonesia is the symptom. The fault - the specific failure that produced the symptom - could be in distributor capability, territory design, pricing, channel coverage, or field execution. Fault attribution for commercial systems means routing the symptom to the specific operational cause, with evidence sufficient to support the attribution.

This is technically achievable and almost nobody does it. Here is why it's hard, and what the architecture for doing it looks like.

Fault attribution requires three things working in concert.

First, an immutable event log. Every significant occurrence in the commercial system has to write a permanent, timestamped, actor-attributed row. A sell-in upload arrives. A DSR visit is completed. A distributor review meeting is held. A promotion goes live. A territory boundary is changed. Every event, append-only, never edited, attributable to a role and a time. This is the system's memory. Without it, attribution is impossible because the causal history doesn't exist in retrievable form.

Second, domain state tables. The current state of every operational entity - distributor, outlet, DSR, territory, promotion, joint business plan - has to be queryable at any moment. The state is updated by events, but the current state itself has to be available without reconstructing it from the full event history each time. This is the system's present. Without it, attribution is slow - every question requires a full replay of history.

Third, an expectation and gap engine. Client-configured rules defining what should happen, when, and at what threshold. Monthly distributor review meetings are expected. If March passes without one being logged, the gap engine flags it. A sell-out upload is expected weekly from each distributor. If it doesn't arrive, the gap engine flags it. A DSR route is expected to have at least 10 visits per day. If actual falls below threshold, the gap engine flags it. This is the system's active comparator between intention and reality.

When a symptom appears - Indonesia missed target - the attribution engine queries these three layers together. It reconstructs the operational state of the territory at the time the target was missed. It identifies gaps between expected and actual activity during the relevant period. It traces specific events that triggered or failed to trigger. The output is not "Indonesia missed target because of distributor capability" - it's "Indonesia missed target during weeks 32 through 41 of the year. During that period: DSR route coverage in East Java dropped from 85% to 58% average. Distributor X's working capital submission for week 30 reported a shortfall that was not escalated. The expected reorder cycle on SKUs 4, 7, and 12 broke down starting week 33. Sell-out data was submitted late or incomplete for weeks 34 through 38. The downstream effect is visible in the territory's monthly volume from week 35 onward."

That's attribution at operational granularity. The RMD reading that output doesn't have four competing theories to adjudicate. They have a single causal thread with specific events attached. They can intervene at the specific points in the thread - unblock Distributor X's working capital, escalate DSR coverage in East Java, fix the sell-out submission compliance - rather than making an organization-wide bet on which theory is right.

Fault attribution has three downstream effects on how the commercial organization operates, and they're each worth naming.

First, accountability changes. Before fault attribution, "the distributor is underperforming" is an opinion. After fault attribution, "the distributor's coverage in East Java dropped below 60% in weeks 32 through 41" is a fact. The conversation with the distributor becomes different - it's about the specific gap, not about character assassination. The distributor can respond to a specific gap. They can't respond to a general accusation.

Second, the conversation between roles changes. Before fault attribution, the Commercial Director, Sales Manager, Marketing lead, and Operations lead each have their own theory of what went wrong, with no shared basis of evidence. After fault attribution, they share a causal thread. Their disagreements - which still exist - become disagreements about weights and interventions, not about which theory is right. That's a much more productive form of disagreement.

Third, intervention precision improves. Before fault attribution, interventions are broad-spectrum: restructure the distributor network, reorganize the field team, launch a new pricing program. These are expensive and often hit the wrong target. After fault attribution, interventions are narrow: unblock this specific distributor's working capital, restore DSR coverage in this specific territory, fix the sell-out compliance failure at these specific distributors. Narrow interventions are cheaper and - because they target the actual cause - more effective.

The commercial category is full of tools that measure outcomes. Revenue, volume, market share, margin, coverage, productivity - measured and dashboarded and reported up the chain. None of those tools attribute the outcomes to causes. They tell you what happened. They don't tell you why.

Fault attribution - the capability of routing outcomes to causes at operational granularity - is the category-defining capability for the next generation of commercial execution platforms. Most current tools cannot do it because they don't have the architecture for it. The immutable event log, the domain state tables, the expectation and gap engine - these are architectural commitments that have to be made from the start. Retrofitting them onto existing tools is nearly impossible.

But the tools that are built with this architecture from the start change the conversation between operators and their commercial organizations. The RMD who has fault attribution stops being at the mercy of their team's competing theories. They have evidence. The commercial team stops being defensive about outcomes they don't control - they know which interventions would help and can make the case for them. The distributors stop feeling blamed for systemic failures - they see the specific gaps and can respond.

Everyone moves up the discourse. That's what the capability unlocks. It's worth building toward, even if it takes years.

Two honest caveats about fault attribution that most advocates of the capability don't address.

Fault attribution is not causality. The system can reliably show what events occurred in what sequence. It can show that DSR coverage dropped at a specific time and sell-out followed three weeks later. It cannot, strictly speaking, prove that the first caused the second. Correlation and sequence are not causation, and a fault attribution system produces very detailed correlation-and-sequence narratives that can be mistaken for causal claims.

The practical implication is that fault attribution output requires interpretation. A skilled operator reads the attribution narrative and forms a causal hypothesis - "coverage drop caused the volume drop" - that accounts for the evidence but isn't mechanically derived from it. The hypothesis is testable (if we restore coverage, does volume recover?), but the initial attribution is inferential. A less skilled operator or an automated interpreter can read the same narrative and produce wrong causal claims, particularly when multiple events happened in the same window and any of them could plausibly explain the outcome.

This is not a reason to avoid building fault attribution capabilities - the alternative, no attribution at all, is dramatically worse - but it's a reason to build them with appropriate humility and to train users to treat attribution outputs as hypotheses rather than verdicts.

Fault attribution creates accountability that organizations may not be ready for. This is the more significant caveat, and it's cultural rather than technical. When outcomes can be routed to specific causes with specific events and specific actors, the question of who bears responsibility becomes much clearer than it was. In organizations where responsibility has traditionally been diffuse - where everyone nominally owns everything and no one specifically owns anything - fault attribution creates uncomfortable clarity.

The discomfort manifests in predictable ways. Managers discover that the coverage drop in East Java happened under their oversight and they didn't know. Distributors discover that the working capital shortfall they experienced was visible in the data months before anyone acted on it. Field teams discover that their route compliance metrics were being tracked even when they thought the data wasn't being looked at. Each of these discoveries is an accountability moment. Some of them land badly.

Organizations deploying fault attribution capability should expect this and plan for it. The technical capability can be deployed in weeks. The organizational capability to act on it productively takes months or years. Rolling out the technology without rolling out the organizational readiness produces a predictable failure mode: the data is available, nobody acts on it, the accountability cases accumulate, and eventually someone above the commercial organization decides to use the data in a way that sours everyone on the whole system.

The productive deployment pattern is to roll out the capability gradually, with an explicit agreement about how the data will be used. In the first phase, attribution outputs are diagnostic only - used to understand what happened, not to assign blame. In the second phase, attribution outputs drive intervention decisions - specific, forward-looking actions to address the causes identified. In the third phase, attribution outputs become part of the performance management cadence - but by this point the organization has developed norms for how attribution data is used, and the system supports rather than destabilizes the performance conversation.

Skipping to phase three immediately destabilizes. Organizations that do this discover, often after investing significantly in the technology, that their people stop entering data honestly, start gaming the events being tracked, and eventually drive the system's attribution outputs toward uselessness as a defense mechanism.

The deeper question is what the commercial organization is for.

In organizations that see the commercial function as executing against targets - top-down, measured by outcomes, with people at the bottom of the org responsible for performance - fault attribution is easily weaponized. The capability gets used to identify scapegoats rather than to diagnose systems. People learn to fear the data, which leads them to shape the data, which makes the data worthless.

In organizations that see the commercial function as a learning system - with targets as orientation rather than judgment, and performance as emergent from structural conditions rather than individual effort - fault attribution becomes a genuinely productive capability. The data identifies structural issues that can be addressed systematically. People use the data to make their own work better. The system gets more honest over time because honesty is rewarded.

The same technology produces wildly different outcomes depending on which organizational model is using it. This is worth taking seriously for anyone considering investing in the capability. The technology is necessary but not sufficient. The organizational stance toward the technology is what determines whether the investment pays off.

A final thought on what "done" looks like for fault attribution.

The capability is never fully complete. There are always events that aren't captured, actors whose actions aren't traced, causes that aren't visible to the system. New commercial dynamics appear that the existing event taxonomy doesn't cover. The work of extending and refining the attribution capability is continuous.

What "done enough to be useful" looks like is more specific. The capability is useful when it can answer, for any significant commercial outcome, a question of the form: "what happened in the period leading up to this outcome, in the operational system under examination, and which events plausibly contributed?" If the answer is substantive and specific, the capability is useful. If the answer is thin or generic, it's not yet there.

Most organizations starting to build this capability find that their first useful outputs come twelve to eighteen months after the architectural foundation is in place. The event log needs time to accumulate meaningful history. The gap engine needs time to learn which patterns matter. The users need time to develop the skill of reading attribution narratives productively. Organizations that expect immediate value and don't see it in month three often pull back - which is exactly wrong. The capability is slow-starting and long-compounding. Invest through the slow start or don't invest at all.