Ask ten developers how they talk to AI and you'll get ten different answers.
Some write structured prompts with labeled sections and explicit constraints. Some just describe what they need in plain sentences. Some have settled on a format that works for them and stopped thinking about it. Most of them are getting decent results either way.
So the question is worth asking: does the structure actually matter, or have we been optimizing something that was already fine?
The answer is yes, it matters. But only for one of the two things people use AI for, and most of the advice online doesn't make that distinction.
The Research Says One Thing. The Reality Is More Specific.
Someone recently did a meta-analysis of over 1,500 academic papers on prompt engineering and found that the companies generating real revenue from AI aren't following the advice circulating on social media. They're doing something more systematic, and more boring.
But buried inside that finding is something more interesting than "conventional wisdom is wrong."
The real distinction isn't between good prompts and bad prompts. It's between two completely different use cases that most people treat as the same thing.
Two Ways People Actually Use AI
The first is personal use. You need to draft something, debug a function, think through a decision. You open a chat window and describe what you want. Maybe you mention your role or give some context. You read the output, ask a follow-up, and move on.
The second is system use. An application calls an AI endpoint. It does this hundreds or thousands of times. Every call needs to return output in a specific format so the rest of the system can parse it. The response isn't being read by a human who can interpret variations. It's being consumed by code that expects consistency.
These are not the same problem. And the prompt engineering advice that applies to one doesn't necessarily apply to the other.
For personal use, a clearly scoped natural language prompt works fine. You don't need XML tags. You don't need a <role> block. You need to be clear about what you want, and modern models are good enough to handle the rest.
For system use, structure isn't a stylistic choice. It's a reliability requirement.
I Ran the Same Prompt Three Times
To make this concrete, I tested it.
I asked AI to write a JavaScript function that analyzes a user object and returns a structured report. The function needed to calculate days since last login, determine whether someone was a high-value customer, and return a specific status message based on that determination.
I ran the same request three times in natural language, then three times using a structured prompt. Fresh conversation each time.
The natural language prompt:
I have a JavaScript object representing a user with fields like name, email, age,
subscription (which can be "free", "pro", or "enterprise"), lastLoginDate, and
totalPurchases. Write a function called generateUserReport that takes this user
object and returns a report object. The report should include the user's full name
and email, whether they're a high value customer (pro or enterprise subscribers
who have made more than 5 purchases), how many days since they last logged in,
and a status message that says something different depending on whether they're
high value or not. Return just the code.
The structured prompt:
<role>Senior JavaScript Developer</role>
<task>
Write a function called generateUserReport(user) that analyzes a user object
and returns a structured report.
</task>
<input_shape>
{
name: string,
email: string,
age: number,
subscription: "free" | "pro" | "enterprise",
lastLoginDate: ISO date string,
totalPurchases: number
}
</input_shape>
<output_shape>
{
fullName: string,
email: string,
isHighValue: boolean,
daysSinceLogin: number,
statusMessage: string
}
</output_shape>
<requirements>
- isHighValue: true only if subscription is "pro" or "enterprise" AND totalPurchases > 5
- daysSinceLogin: calculated from lastLoginDate to today
- statusMessage: "Priority customer - schedule check-in" if isHighValue,
"Standard account" if not
</requirements>
<constraints>
- Output ONLY valid executable JavaScript
- NO markdown formatting or backticks
- NO explanations or comments
</constraints>
The natural language results all produced correct, working code. But look at what happened to statusMessage across three runs:
- Run 1:
"Thank you for being a valued premium partner!" - Run 2:
"Check out our latest offers to upgrade your experience." - Run 3:
"Priority account: This user is a key contributor to revenue."
Three completely different strings. Different tone, different phrasing, different intent. The key name for days since login also drifted, appearing as daysSinceLastLogin in some runs. Some runs added JSDoc comments. Others didn't.
The structured prompt told a different story. Across all three runs, statusMessage was identical every time: "Priority customer - schedule check-in" or "Standard account". Key names matched the <output_shape> exactly. No comments appeared unless specified.
The code quality wasn't meaningfully better. What changed was the predictability.
Predictability Is the Point
This is the thing that most prompt engineering content misses.
When you're using AI yourself, variation is often fine. If the output is slightly different each time, you read it, evaluate it, and decide if it works. You're in the loop.
When AI is part of a system, variation becomes a bug. If a status message changes between calls, something downstream breaks. If a key name shifts, a parser fails. No one is reading the output before it gets used. Consistency isn't a nice-to-have. It's the requirement.
Structured prompts with explicit output definitions, clear constraints, and specific key names lock the model into a narrower range of responses. That's what makes them useful in production, not that they produce higher quality output, but that they produce the same output reliably.
For everything else, the overhead usually isn't worth it. Describing what you need in plain language, with clear scope and context, is enough. The model understands you. It doesn't need to be spoken to in a specific dialect.
So Is It Real?
Prompt engineering is real, but it's a specific tool for a specific problem.
If you're building a system where AI output feeds into code, where consistency across calls matters, where no human is reviewing each response before it gets used, then the structure is worth it. The tags, the explicit schemas, the defined constraints, they exist to solve a reliability problem, not a quality problem.
If you're using AI to do your own work, spending an hour on prompt structure is probably not the best use of that hour. Be clear. Give context. Describe what you want. That's most of what matters.
The people selling magic templates have it backwards. The goal isn't to find the right words. The goal is to understand what you're actually trying to solve, and then use the right level of structure for that problem.
For most things, that level is lower than the internet would have you believe.

