Why Entity Recognition Beats Prompt Tracking for AI Search Measurement
At Brighton SEO last week, I counted more than a handful of prompt-tracking vendors on the floor. Each one promises to show you when ChatGPT, Perplexity, or Google AI Overviews mentions your brand. Each one solves the same problem in slightly different colors.
The conference floor is a subset of the wider market. Back in January, an audit by Rankability had already documented twenty-two distinct tools across the segment, with more than $31 million in funding behind them. New ones launch every month.
Then I sat down with Dixon Jones (thirty-seven years in SEO, founder of Majestic, founder of Inlinks, and now running Waikay), and he told me the entire category is starting in the wrong place.
Take the canonical example these tools track: somebody asks ChatGPT for the best hotel in Brighton, and the prompt-tracker logs whether your hotel surfaces in the answer.
Dixon’s reframe stops that workflow at step zero.
At some point you want to find out whether AI pops up your brand when you ask for the best hotel in Brighton. Of course you want that at some point. But you don’t start there. You start by knowing whether the AI knows that you’ve got a hotel in Brighton. If it doesn’t know that, the rest is gonna be wrong. - Dixon Jones
That single observation is the cleanest critique of the AI-search measurement industry I have heard this year. It changes which tool you should buy first.
Dixon Jones was at Brighton SEO last week, mostly to support his daughter, Genie Jones , who was speaking on how to extract actionable data from AI. Their company sits in deliberate contrast to the prompt-tracking tools surrounding it on the conference floor.
Where most vendors built rank-checkers for AI, Waikay built an entity-mapper. Genie’s path into entity SEO is unusually well-aligned: she studied language, culture, and communications at university (specifically how the brain builds associations between concepts) and joined the family business when Google’s algorithm shifted from keyword matching toward authority and association. The timing turned out to be exact.
The argument, simplified: an LLM cannot rank you in a prompt response if it doesn’t know your brand exists in the first place. If it doesn’t, prompt tracking is measuring noise.
What does prompt tracking actually measure, and what does it miss?
Prompt tracking is the AI-search analogue of rank tracking. You feed the tool a list of buyer-intent prompts (”best CRM for SMB”, “what hotels should I stay at in Brighton”), and it logs how often your brand appears, in what position, with what sentiment, across ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews.
The mechanics are clean. The output is unstable.
SE Ranking research found only 9.2% URL consistency in Google AI Mode across repeated queries. SparkToro independently observed significant variability in AI-generated brand recommendations even with identical prompts. Search Influence’s March 2026 audit puts it directly: “point-in-time AI visibility measurements may reflect volatility rather than durable performance signals.”
Dixon’s deeper objection is a category error about how people actually use LLMs. Prompt-tracking tools assume customers are searching the model the way they used to search Google.
But Dixon’s actual usage looks different. He summarizes content (often produced by some other AI), replies to emails through prompts, generates content rather than retrieving it. The ways people use LLMs are not “I’m going to search for something.” They are “I’m going to get the LLM to do something.”
So even when prompt tracking works perfectly, it is measuring a behavior that represents a shrinking slice of how LLMs actually get used. The tools were optimized for the search interface; the users moved to the agent interface.
Why entity recognition has to come first
Genie’s framework starts before you ever open a prompt-tracking dashboard. The first move is to understand your brand’s entities away from AI: what topics you cover, how well you cover them, where the gaps are.
Then go to the LLM and ask, with live search switched off, what it knows about you at the foundational training-data level.
That distinction is doing a lot of work.
There are two layers of AI-search measurement, not one. The training-data layer is slow, foundational, and stable across queries.
The live-search layer is fast, citation-driven, and unstable in the way the SE Ranking data confirms. Most teams only buy tools that show them the unstable layer.
LLMs aren’t ranking you. They’re figuring out how closely associated you are with the topics it needs to fulfill the user request. - Genie Jones
That mental swap (from ranking to topical association) changes what you measure. A knowledge graph is not a leaderboard. The model is not slotting you into position three. It is checking whether your brand has dense enough association with the topics the query touches to be worth mentioning at all.
Discovered Labs corroborates the mechanism: AI systems retrieve facts from knowledge graphs rather than ranking pages. Companies that restructure for entity verifiability, using formats like the EAV-E framework (Entity-Attribute-Value-Evidence), reportedly improve from 5% to 42% citation rates within ninety days. The investable layer is upstream of where the prompt-tracking tools are looking.
Six-week feedback on the live layer matters. Training-cycle horizon on the foundational layer matters more.
How hallucinated citations reveal your content gaps
The most counter-intuitive part of the framework is what you do with hallucinations. Most measurement teams treat fabricated citations as noise to discard. The Joneses treat them as content-gap intelligence.
When the LLM cites a page on your site that does not exist, or attributes content to a section you have never written, it is showing you the shape of the answer it expected to find there. That shape is a content brief.
Lakera’s 2026 hallucination guide locates the cause in training incentives (models are rewarded for confident generation over honest refusal), but the practical implication for brands is the same.
The hallucinated structure is not random. It is the model’s best guess at what your site should contain, derived from the patterns it has seen across competitors and adjacent topics.
The application matched a story I’d heard last year from the PromptWatch team: when log files showed LLMs hallucinating types of content on a brand’s website, the team built that content for real, and citations followed.
That reveals a lot about the structure [the LLM] expects from your website - Genie Jones, on what hallucinated citations show you
The discipline that supports this (and the one most measurement teams skip) is to pull the full citation set when you analyse a topic. Not just the citations that mention your brand.
Don’t predetermine your competitor list before you look. Let the model surface the actual source layer.
The Reddit post you’ve never seen, the review from three weeks ago you didn’t know existed: that’s where the model is reading from. The Joneses’ own write-up on this is the cleanest extended version of the methodology.
Prompt tracking is the last layer, not the first
The order matters more than the tooling. Prompt tracking is not wrong; it is misordered. It belongs at the end of the entity-recognition workflow, not the beginning.
The work that earns the citation happens upstream: in whether the LLM knows your brand exists, what topics it associates with you, whether your knowledge graph aligns with the entities the model has at training-data level.
Once that gate is cleared, prompt tracking shows you the daily fluctuations on top of a stable foundation. Before the gate is cleared, prompt tracking shows you nothing actionable.
There is a related layer worth flagging separately: the descriptor sentence. When the LLM lists ten options in response to a buyer-intent prompt, it doesn’t just rank them. It writes a single sentence about each one explaining the differentiation.
Is that sentence really what’s gonna sell you to your customer? - Dixon Jones
That sentence is the actual battleground. Waikay is competing with Semrush’s $745/month AI Toolkit by getting the LLM to write a narrowly differentiated descriptor about Waikay’s entity-first positioning. The narrow focus is the strategy. Generalist coverage produces generic descriptor sentences, and you cannot win the descriptor battle with a generic descriptor.
This is where IDX’s Authority Flywheel framing lands: brands that align consistent entity signals across digital touchpoints win the citation, because the model can map and cite them with confidence. The descriptor sentence is the visible output of that consistency.
What to do after reading this article
The actionable sequence:
Map your entities away from AI first. Build the knowledge graph manually. Know what topics you cover, where the gaps are, what your canonical descriptions look like across LinkedIn, Crunchbase, G2, Wikipedia, and your own site. Use the EAV-E format (Entity-Attribute-Value-Evidence) for any claim you want the LLM to be able to cite.
Audit what the LLM knows about you with live search switched off. Ask each major model what it knows about your brand, your category, your closest competitors. The answers reveal the foundational entity profile.
Pull citations for every brand the LLM mentions on your topic, not just yours. Don’t predetermine the competitor set. Let the model surface the actual source layer.
Use hallucinated citations as content briefs. Where the LLM expects content that doesn’t exist on your site, build it.
Then, and only then, start prompt tracking. Once the entity layer is stable, the live layer becomes a meaningful daily signal rather than noise.
The shift in 2026 is not that prompt-tracking tools are bad. The twenty-two trackers Rankability identified will keep multiplying, and most of them are competently built.
The shift is that they are downstream tools that most teams are buying upstream of the work that actually moves the needle. Buy the tool that maps the entity first. Buy the tool that tracks the prompt second.
Resources
How to Turn LLM Noise into Brand Strategy Using Entities and Citations: Genie Jones’s full framework on Waikay’s blog
AIs are highly inconsistent when recommending brands or products: SparkToro’s primary research with Rand Fishkin and Patrick O’Donnell, 2,961 prompts across ChatGPT, Claude, and Google AI
AI Mode Research: Sources, Volatility, & Differences between AIO and Organic Search: SE Ranking’s 10,000-keyword study on AI Mode URL consistency
Entity Recognition & Knowledge Graphs: Discovered Labs on the EAV-E formula and citation source distribution
22 Best AI Search Rank Tracking & Visibility Tools (2026): the proliferation, audited, with funding figures
The Authority Flywheel: IDX on entity SEO and the two-gate LLM evaluation model
LLM Hallucinations in 2026: Lakera on why hallucinations reveal training incentives, not random errors
Listen to the full conversation
The full sit-down with Dixon and Genie Jones at Brighton SEO last week is on Spotify. It also covers Dixon’s biggest SEO regret in thirty-seven years, why Genie tried not to be an SEO at first, and the discipline of working twelve time zones apart from your team.
Or watch the video here:
