Skip to content
Before the code, the criterion: what to measure in week one

Before the code, the criterion: what to measure in week one

In two weeks of Diagnóstico, Forja does one thing that looks simple and turns out to be hard: it defines the success criterion. The AI comes after that.

Most retail AI projects start from the wrong end. They pick the technology first: model, platform, vendor.

Then they staff the squad. Then they ask what good looks like.

That sequence produces 73.8% project failure, per a 2026 SupplyChainBrain study cross-referenced with TechRadar data. The number is broad enough to make a headline. It is also narrow enough to hide what matters: of the roughly 26% that ship, most defined the criterion BEFORE the code.

So what is a success criterion, exactly?

Three things, all measurable, all decided in a conversation before go-live:

  • The indicator that counts as success. Not “customer satisfaction.” Something like “shelf-out for the perishables category, measured Friday morning and Monday afternoon.”
  • The expected range. Not “improve.” Something like “stay between 1.2% and 2.0%. Below 1.2% is suspicious; above 2.0% is a problem.”
  • The review trigger. Not “monitor continuously.” Something like “if the number falls outside the range for three consecutive weeks, stop the rollout and reassess.”

Each answers a different question.

The indicator says what we are watching. The range says what is normal. The trigger says when someone has to stop everything and ask why.

If one of those three pieces is missing, the project does not have a criterion. It has aspiration.

Here is what happens when you skip this step.

The operation rolls out. The numbers improve, or seem to improve, or stay flat. The internal team looks at the dashboard and nobody can say whether it is good.

The vendor says it is. The CFO asks whether it is worth renewing. You cannot answer because nobody ever defined what “worth” would mean.

In six to nine months, the project does not die. It just stops getting cited.

It becomes the item on the initiative list that nobody defends and nobody attacks. Eventually someone removes it from the plan and nobody notices.

That is the most common outcome. It is not “failed.” It is “evaporated.”

Think for a second whether your current operation has a project in that state.

How to define the criterion in practice

The conversation starts with the problem, not the solution. “We want to reduce shelf-out” is a start. “We want to use AI” is not.

After that, three questions, in this order:

  1. Which number counts? Who in the operation can read that number today without asking IT? If nobody can, that is the first piece of work, and it has to happen before the first line of model code.
  2. What is the current range? Not the desired one. The actual one, with the operation running the way it runs today. Without that baseline, any “improvement” is rhetoric.
  3. What range justifies the investment? If the baseline is 4.2% and the target is 1.8%, the project has a criterion. If the target is “lower,” the project has a wish.

The range matters more than the target number. A project that has to land at exactly 1.8% is fragile. A project that has to land between 1.5% and 2.5% has real operational room.

The review trigger has to be in the document. In writing, with the date that the review happens if the range is violated.

Without that piece, the criterion is not a criterion. It is decoration on the wall.

The part nobody wants to do

Honestly, defining a criterion is thankless work.

The conversation gets hard because it requires people from different departments to agree on what counts. Operations wants to talk about flow. CX wants to talk about NPS.

Finance wants to talk about margin. IT wants to talk about uptime. All of that is legitimate, and none of it is “the” criterion.

Leadership that skips this fight delegates to operations to run the project. Operations adopts the easiest number to measure. It is usually the wrong one.

There is no shortcut here. Two weeks is the time it usually takes to unstick this conversation between departments. More than that, the company’s patience runs out; less than that, the conversation was not hard enough.

What this post is not promising

Defining a criterion does not guarantee the project will succeed. It only guarantees that, if it does not, you will know, and in time.

That difference is everything.

Without a criterion, bad projects keep running because nobody can claim they are bad. With a criterion, bad projects are stopped before the cost doubles.

The ROI of the criterion is not in the project it helps succeed. It is in the bad projects it lets you kill early.

Be honest with yourself. Look at the AI projects in your operation. Pick one.

Can you say, in one sentence, what the number is, what the range is, and what the review trigger is? If yes, the project has a criterion. If not, it has aspiration.

The difference between the two is a conversation nobody wants to have. Operations versus finance versus CX versus IT about what each one will defend.

Hard because each department defends the number it already measures. Almost impossible without someone in the room who has run this conversation before.

Send me the three candidate indicators your project will measure. In one hour I will return a diagnosis: which predict success, which predict silent failure, and what review trigger each one needs.

If the project shape confirms, the two-week Diagnóstico ends with a one-page document that goes on the squad’s wall: three indicators, three ranges, three triggers. The internal team holds itself to it for the rest of the Implementação.