Frameworks

The Citation Rubric: how beknown grades AI responses about a business

The Citation Rubric is beknown's scoring system for evaluating how AI tools describe a business. It grades responses across four dimensions: factual accuracy, completeness, sentiment, and source attribution.

Duncan Hotston·

The Citation Rubric is beknown's scoring system for evaluating how AI tools describe a business when someone asks about it. It grades responses across four dimensions: factual accuracy, completeness, sentiment, and source attribution. Each dimension is scored independently. Together they produce a composite picture of how well, or how poorly, a business is represented in AI-generated answers.

This is not a measure of search rankings. It is a measure of what gets said about you when a machine decides to speak on your behalf.


Why grading AI responses requires its own framework

Think of a restaurant critic. When they write about a place, the review does several things at once. It states facts, gives an impression, notes what is missing, and signals whether the publication itself is a credible source. A review that gets the address wrong, omits the specials, sounds unenthusiastic, and appears in a blog nobody reads is not helping the restaurant. It might be doing the opposite.

AI tools function like critics at enormous scale. When someone asks ChatGPT or Perplexity to recommend a conveyancing solicitor, an accountancy firm, or a logistics provider, the AI produces a response that does the same four things: it states facts, conveys a tone, includes or omits information, and draws on sources of varying authority. The Citation Rubric scores each of those dimensions so businesses can see exactly where the problem is, not just that one exists.

Standard analytics tell you how many people visited your website. The Citation Rubric tells you what the AI said about you before they decided whether to bother.


The four dimensions

Dimension 1: Factual accuracy

This measures whether the AI's response about your business is correct. Name, category, service offering, contact details, geographic coverage: all of it. Factual errors in AI responses are more common than most businesses expect, and more damaging than a simple wrong number. An AI that describes a business as doing something it does not do, or has not done for three years, is sending enquiries in the wrong direction.

Factual accuracy is scored on a 0 to 25 scale. A score of 25 means the response contains no verifiable errors. A score below 10 means the AI is working from outdated, incomplete, or contradictory data about the business.

The primary fix is structured data and entity-level corrections across directories and knowledge sources. If the inputs are wrong, the output will be wrong.

Dimension 2: Completeness

This measures how much relevant information the AI includes, relative to what it should include. A response can be entirely accurate and still miss the things that would make a buyer choose you.

A solicitor whose AI description mentions law but omits that they handle commercial property exclusively is not being served well. A consultancy described generically, when it has a defined specialism and a measurable track record, is leaving its most persuasive material on the floor.

Completeness is scored on a 0 to 25 scale. It is graded against what a well-informed AI response about a business in that category would typically contain, not against a subjective standard.

Completeness improves when the signals that carry that information are present: rich schema, thorough entity signals, and structured content that AI systems can index and retrieve.

Dimension 3: Sentiment

This measures the tone of the AI's response. Not whether it is glowing or damning, but whether it is neutral-positive, neutral, or neutral-negative. AI tools rarely say terrible things about businesses. What they do instead is produce descriptions that are flat, generic, and devoid of any signal that one business is preferable to another.

Flat sentiment is a problem because it leaves the reader with no reason to act.

Sentiment is scored on a 0 to 25 scale. It is informed by the sources the AI draws on: review platforms, press mentions, third-party write-ups, and how those sources characterise the business. A business with a strong, consistent review profile and credible third-party coverage will tend to produce warmer AI responses than one with thin or mixed signals.

This is also why sentiment is the hardest dimension to game. It is a downstream consequence of how a business is actually perceived in sources the AI trusts.

Dimension 4: Source attribution

This measures where the AI's information is coming from. A response can be accurate, complete, and well-toned, and still be drawn from a source the AI will eventually deprioritise or that the business has no control over.

Source attribution is scored on a 0 to 25 scale. High scores indicate that the AI is drawing from authoritative, durable, business-controlled or business-confirmed sources: structured registry submissions, a well-formed llms.txt file, verified third-party profiles, and schema-marked content on the business's own site.

Low scores indicate that the AI is drawing on whatever it happened to find, which may be a press release from four years ago, a directory listing with outdated information, or a scrape of something the business never intended to be its primary description.

Source attribution matters because the AI's sources determine not just what it says now, but how reliably it will be able to update what it says later.


How the rubric is applied

A Citation Rubric assessment runs a set of structured prompts across multiple AI tools, asking each to describe the business in various ways: by name, by category, by the type of problem it solves. The responses are captured, compared against verified ground truth, and scored dimension by dimension.

The result is a scorecard. A composite score from 0 to 100, four dimension scores, and a set of specific observations about what is missing, what is wrong, and what is working.

This is one of the Four Numbers that appear at the start of every beknown case study. The others are the isitagentready level, the 16-Probe Scan score, and the AI referral traffic delta. Together they give a before-and-after picture of AI visibility that is specific enough to be actionable and consistent enough to be comparable across businesses and over time.


Common patterns in low-scoring assessments

After running the Citation Rubric across a range of businesses, certain patterns repeat.

High accuracy, low completeness. The AI knows the business exists and has the basics right, but produces a description that would fit fifty competitors equally well. This usually indicates that the business has a verified presence but has not built the richer signals that would give the AI something more specific to say.

High completeness, low source attribution. The AI produces a detailed and accurate description, but draws it entirely from a single directory or an old press mention. This is fragile. When that source is updated or deprioritised, the description degrades.

Low accuracy across the board. The AI is working from contradictory data. Multiple sources give different answers about what the business does or where it operates. The AI averages them, and the average is wrong. This is common in businesses that have changed focus, rebranded, or moved without updating their structured presence.

Sentiment stuck at neutral despite good reviews. The AI has access to positive review data but is not drawing on it in a way that affects tone. This usually indicates that the reviews are not being surfaced in a format the AI retrieves as authoritative, or that the review sentiment is not reinforced by any third-party editorial coverage.


The relationship between the Citation Rubric and the 5-Layer Framework

The Citation Rubric is a measurement tool. The 5-Layer Framework is the build system. They work together.

Each dimension of the rubric maps to one or more layers of the framework.

Factual accuracy maps to structured data and crawlability. If the AI cannot read a site reliably, or if the structured data contradicts the page content, accuracy suffers. Completeness maps to entity signals and structured data: the richer and more consistent the entity profile, the more complete the AI's response tends to be. Sentiment maps to entity signals, specifically the review and third-party coverage component. Source attribution maps to llms.txt and registry submissions, the signals that tell AI systems where to look and what to trust.

Running the Citation Rubric before building signals gives you a baseline. Running it again after gives you a measurement of what changed. That is the Four Numbers framework in practice.


What the rubric does not measure

The Citation Rubric measures the quality of AI responses about a business. It does not measure search engine rankings. It does not measure human-authored content quality. It does not measure website conversion rates.

It also does not measure every AI tool in existence. The assessment focuses on the tools that currently account for the most AI-assisted discovery: ChatGPT, Perplexity, and Claude. Coverage may expand as the landscape changes.

And it does not predict exactly what a given user will see on a given day. AI responses vary by phrasing, context, and the tool's current retrieval state. The rubric captures a representative sample, not a definitive census. The patterns it reveals are real. The specific numbers should be read as directional, not absolute.


Running your own assessment

The fastest way to get a baseline is to run the free visibility check on beknown.world. The 16-Probe Scan covers the structural signals that underpin Citation Rubric performance. It will not give you a full rubric scorecard, but it will tell you whether the foundations are in place for AI tools to describe your business accurately at all.

If the foundations are not in place, the rubric score is almost certainly low. That is a reasonable starting assumption for any business that has not explicitly built AI visibility signals.

The Citation Rubric is included in beknown's full assessment. It is the part that tells you not just whether AI tools can find you, but what they say when they do.

Those are different questions. Both matter.


Frequently asked questions

What is the Citation Rubric?

The Citation Rubric is beknown's framework for scoring how AI tools describe a business when asked about it. It evaluates responses across four dimensions: factual accuracy, completeness, sentiment, and source attribution. A business that scores well is being described correctly, fully, positively, and with traceable sources. A business that scores poorly is being misrepresented, ignored, or cited from unreliable places.

Why does it matter how AI tools describe my business?

Because that description is often the first thing a potential customer sees. When someone asks an AI tool for a recommendation or supplier, the AI's response functions like a word-of-mouth referral. If that referral contains errors, gaps, or a flat tone, it affects whether the person clicks through or moves on. You cannot correct what you cannot measure.

How is the Citation Rubric different from a standard SEO audit?

A standard SEO audit measures signals that affect how search engines rank pages. The Citation Rubric measures something different: what AI tools actually say about your business in conversational responses. Those are distinct systems with distinct inputs. A business can rank well in search and still be described poorly, vaguely, or incorrectly by AI tools.

What does a low completeness score mean in practice?

It means the AI's response about your business is missing material facts. That might be your core service, your location, your specialisation, or your differentiators. The AI is not lying. It simply does not have the information, because the signals that would convey it are absent from your digital presence. Completeness improves when those signals are built.

Can I improve my Citation Rubric score?

Yes. Every dimension of the rubric maps to specific, buildable signals. Factual accuracy improves with structured data and entity-level corrections. Completeness improves with richer content and schema coverage. Sentiment improves with review signals and authoritative third-party coverage. Source attribution improves with registry submissions and a well-formed llms.txt file. These are the same signals covered in the 5-Layer Framework.

How often should I run a Citation Rubric assessment?

At minimum, quarterly. AI tools update their training data and retrieval sources continuously. A response that was accurate and complete three months ago may have degraded. New competitors enter the space. Source authority shifts. The rubric is most useful when run on a rhythm, not as a one-off exercise.

What is a good Citation Rubric score?

The rubric is scored from 0 to 100 across four dimensions, producing a composite score and individual dimension scores. There is no universal benchmark, because the relevant comparison is your category, not an abstract ideal. A composite score above 70 indicates solid AI visibility. Below 40 suggests material gaps. The dimension breakdown matters more than the composite: a business can score 75 overall but 20 on source attribution, which is a specific, fixable problem.

citation rubricAI searchcitation rubric AI searchAI visibilityentity signalsAEO

Check your AI visibility

Find out how AI search engines see your business. Free check, no commitment.

Get your free check