Stop Asking AI for Answers
The case for demanding questions instead
I have a checklist. Seven tests that every post has to pass before I hit publish: hook, framework, evidence, contrarian angle, practical takeaway, voice, energy.
A couple of months ago, I thought - maybe Claude could help? I pasted a draft post into the chatbot along with my checklist and asked: Does this pass?
“This isn’t ready. The hook doesn’t land, the framework needs sharpening, and the evidence test is borderline. Don’t publish yet.”
Fair enough. But then, I decided to do an experiment. I opened a new chat. I used the same model, pasted the same draft, same prompt, with the same checklist.
“This is strong. The hook pulls the reader in immediately, the framework is memorable, and you’ve got concrete evidence throughout. Ship it.”
Neither response was wrong, exactly. Each had silently interpreted "does it open with a scene?" or "is there dry humour?" differently, then been completely rigorous about its chosen interpretation. Neither flagged that an interpretation had taken place.
I call this a Precise Misread: when AI silently interprets your ambiguous question, then delivers an answer so rigorous and so confident that you never realise a different interpretation would have led to a different conclusion.
I went through the checklist myself. Published the post. It did well.
Apparently I'm still a better judge of my own writing than the two AIs that disagreed about it.
150 AI agents walk into a dataset
Researchers at the University of Texas at Dallas gave 150 AI agents the same financial dataset and the same question: “Did trading volume change over time?”
Ninety agents reported that volume grew by about 6% per year. Sixty reported that volume declined by about 5% per year. Within each group, the agents agreed almost perfectly, within 0.25% and 0.11%, respectively. Precise. Consistent. And completely opposing one another. In a consulting firm, this would be two confident partners arguing at a board meeting.
What happened? The prompt said “trading volume” without specifying what that means. Some agents interpreted it as dollar volume. Others as share volume. Two perfectly valid measures that, over this particular decade, pointed in opposite directions. The misread happened before the work started.
This showed up across the study. Ambiguous questions split agents into precise, opposing camps. But when the question left no room for interpretation, every agent converged. Near-zero disagreement. The AI wasn’t unreliable. It was waiting for a question that meant only one thing.
Different models even had different default interpretations, almost as if they had preferences. Which could make sense if we consider that they were trained differently (different data sets, possibly different training approaches).
Precise prompts vs ambiguous prompts
Earlier this week, I gave a keynote to C-level executives at a financial firm. The CEO told me he prefers very short prompts, against most prompting advice. I agreed with him. Sometimes a short prompt lets the AI bring knowledge you don’t have (we wrote about it for HBR).
But this paper shows the other edge of that sword. The same vagueness that’s a feature for exploration (“what am I not seeing?”) is a risk for analysis (“what’s happening with our numbers?”).
For exploration, you want the AI to interpret freely. For decisions, that silent interpretation is where the danger lives.
The obvious fix is: be more precise. But precision requires you to already know the domain well enough to anticipate every interpretation. If you knew enough to specify “dollar volume, daily frequency, level regression, Newey-West standard errors,” you probably didn’t need the AI to run the analysis.
What would actually help is something the standard chatbot interface actively discourages: asking questions before giving answers.
Think about the best analyst you’ve ever worked with. They don’t go away and come back with a deck. They push back. “When you say customer engagement, do you mean time-on-site or return visits?” “Are we looking at this quarterly or year-over-year?”
That questioning process, which can feel slow, even annoying, is where the real value lives. Half the time, the person asking discovers they’re not sure what they meant either. The questioning process is the thinking process. (In my team, we spend the first 1/3 of every commercial research project on asking questions, the least glamorous, but most valuable.)
We’ve built the fastest, most rigorous analyst in history, and it almost never raises its hand. I’ve written before about AI as alien intelligence. But the alienness isn’t just in how it thinks; it’s in how it listens.
What to do about it
Find forks. The cost of rerunning the analysis is, in most cases, negligible. You can easily run the same analysis five times and look for where the answers split. You’re not looking for the “right” answer. You’re looking for the forks: the points where your question was ambiguous enough to produce different interpretations. Those forks are where the real decisions live, and they should be made by a human, not by whichever interpretation the model happened to sample.
Prompt for questions. Before you let an AI agent run any consequential analysis, add one step. Don’t ask for the answer. Ask for the questions: “What decisions will you need to make to answer this? Where could my question be interpreted in more than one way?”
Then look at that list. Those silent interpretation points? Those are the decisions that actually matter. And they were always yours to make.
The paper referenced is “Nonstandard Errors in AI Agents” by Ruijiang Gao and Steven Chong Xiao (University of Texas at Dallas), posted to arXiv in March 2026 (arXiv: 2603.16744). It’s new and hasn’t yet been peer-reviewed, but the experimental design is rigorous and the data is publicly available. I’d encourage anyone working with AI agents to read it.
The practical implication is this: the danger isn’t an AI that gets it wrong. It’s an AI that gets the wrong thing precisely right. The Precise Misread.



