How do you track AI search visibility? You choose a fixed set of questions your buyers actually ask, you put those questions to the AI engines on a regular schedule, and you record what comes back in a structured way. That is the entire method. There is no dashboard that does it for you by default, no equivalent of a rankings report that just appears. You build the measurement yourself, and the brands that do it have a quiet advantage over the much larger group that is flying blind.

The reason this matters is that AI search has become a real discovery channel and almost nobody is measuring their place in it. A buyer asks ChatGPT or Perplexity which company to use, the engine names three, and if you are not one of them you lost a sale you never saw. With traditional search you at least knew your position. With AI search, most brands have no idea whether they are mentioned, ignored, or described badly. This guide gives you a system to find out: how to build a query set, a framework called the AI visibility ledger for recording results, and how to read what the ledger tells you.

Start with the questions, not the tools

A person searching for information on a laptop at a wooden desk.

The instinct when you decide to track AI search visibility is to go shopping for a tool. Resist it. A tool measures what you tell it to measure, and if you do not yet know which questions matter, the tool just gives you a faster way to track the wrong things.

Begin with the questions instead. AI search visibility is not one number. It is your presence across hundreds of specific prompts a buyer might type, and most of those prompts do not matter to you. “Best running shoes” matters to a shoe brand. It is irrelevant to an accounting firm. The work is identifying the small set of questions where being named or being absent actually changes your revenue, and that is judgment, not software.

Those questions fall into a few groups. There are direct category questions, “best X for Y,” the prompts where a buyer is asking the engine to recommend. There are problem questions, where a buyer describes a pain and asks what to do, and your category is a possible answer. There are comparison questions, “X versus Y,” where buyers weigh options. And there are brand questions, where someone asks about you directly. Each group is a different part of the buying journey. You cannot meaningfully track AI search visibility until you have decided which questions in which groups are worth watching, and no tool can decide that for you.

Build your query set

A query set is the fixed list of prompts you will test every cycle. Fixed is the important word. The whole method depends on asking the same questions over time so you can see movement, and a list that changes every month measures nothing.

Aim for fifteen to thirty queries. That range is large enough to cover your main buyer questions and small enough that you will actually run the full check on schedule. A query set of a hundred prompts is a query set you will test once and abandon. Write each prompt the way a real person would type it, in natural language, not keyword fragments. “What is the best AEO agency for a small SaaS company” is a real prompt. “AEO agency SaaS” is a search-engine habit nobody uses with an AI assistant.

Cover the four groups. Include several category recommendation prompts, since those are where buyers ask to be sold to. Include a few problem prompts that describe a situation without naming your category. Include two or three comparison prompts, especially ones naming you against a known competitor. And include a couple of brand prompts, asking the engine directly about your company, because how an AI describes you unprompted is its own signal. Once this list exists, freeze it. You can add a query later if your market shifts, but treat changes as rare events. The query set is the backbone of everything that follows, and a stable backbone is what lets you track AI search visibility as a trend instead of a series of disconnected snapshots.

The AI visibility ledger

Printed data sheets beside a laptop, arranged for a structured review.

The AI visibility ledger is the recording system, and it is deliberately a plain spreadsheet, because the discipline matters more than the technology.

Set it up like this. Each row is one query from your frozen set. Each query gets repeated once per engine you track, so a row for “best AEO agency for SaaS” exists for ChatGPT, for Perplexity, for Gemini, and so on. Each time you run a check, you fill in a dated set of columns for every row. Over months, the ledger becomes a history: the same questions, the same engines, scored the same way, with the dates lined up so you can see what moved.

The ledger works because it forces consistency on a medium that resists it. AI answers are slippery. They change wording every time, they vary by engine, they shift as models update. If you just “check on AI sometimes,” you get vibes, not data. The ledger converts a slippery thing into a row-and-column record you can actually read. It is the difference between “I think we show up less than we used to” and “our presence on the five category queries in Perplexity dropped from four to one between March and May.” The first is anxiety. The second is something you can act on. The next section covers exactly what goes in the columns.

What to record: the four columns

For every query, on every engine, on every check, record four things. Four columns, scored the same way each time.

The first column is presence. Is your brand named in the answer at all? This is a simple yes or no, and it is the most important single data point in the ledger. Presence is the floor. If you are not present, nothing else matters.

The second column is position. When you are present, where do you appear? Use a short fixed scale: named first or most prominently, named among others, or mentioned only in passing. Position matters because being the third of five names recommended carries far less weight than being the answer the engine leads with.

The third column is sentiment. How are you described? Record the actual framing in a few words: “described as a strong choice for small teams,” or “mentioned but called expensive,” or “listed with no description.” Sentiment is where you catch the quiet problem of being present but described in a way that loses the sale.

The fourth column is citation. Did the engine link to or cite your own website as a source for the claim? Many AI answers footnote their sources. Being cited means the engine is treating your content as evidence, not just repeating your name, and that is the strongest form of AI search visibility there is. Four columns, scored the same way every cycle. That consistency is what makes the ledger readable a year from now.

How often should you run the checks?

Monthly is the right cadence for most brands. Often enough to catch a real decline before it costs you a quarter of sales, infrequent enough that you will keep doing it.

Weekly is too much. AI answers have natural variance, the same prompt returns slightly different results hour to hour, and checking weekly mostly means watching noise and overreacting to it. You will see a drop that is just regeneration randomness, panic, change something, and then misread the next week’s recovery as proof your change worked. Monthly smooths that out. Quarterly is too slow in the other direction, because AI search is moving fast enough that a quarter of invisibility is a serious, expensive gap.

There is one rule that matters more than cadence: run every query the same way each time. Same wording, same engines, ideally a fresh session with no prior chat context skewing the answer, and runs grouped close together rather than spread across two weeks. AI answers are sensitive to context and to model updates, so you are trying to hold everything constant except the calendar. When you track AI search visibility, you are not measuring a single answer, you are measuring a pattern across regenerations, and a clean monthly cycle with consistent method is what makes that pattern legible.

Reading the ledger: presence, sentiment, citation

A ledger with three or four months of history starts answering real questions. Read it in layers.

The first layer is presence over time. For each query, is your yes-or-no presence holding, rising, or falling? A query where you were present in March and April but absent in May and June is a flashing light. Something changed, and you want to find out what before the absence becomes permanent. The second layer is sentiment drift. You can be present the whole time and still be losing, if the description quietly degrades from “a strong option” to “one of several” to “an option, though pricey.” Sentiment drift is invisible to a presence-only check and lethal to conversion, which is the entire reason sentiment earns its own column.

The third layer is citation, and it is the leading indicator. Citation tends to move before presence does. When an engine starts citing your site as a source, durable presence usually follows, because the engine is now treating your content as trustworthy. When citations fade, presence often weakens next. The fourth layer is competitive: because your query set includes comparison and category prompts, the ledger also shows you who is being named instead of you, and that tells you exactly which competitors are winning the AI channel. Reading all four layers turns the ledger from a record into a diagnosis.

Tools can help, but the manual check still wins

A growing set of paid tools will track AI search visibility for you, running query sets across engines on a schedule and charting the history. At scale they are useful, and once your ledger gets heavy, dozens of queries across five engines every month, a tool is a reasonable upgrade.

But start manual, and here is why. Running the checks yourself, by hand, for the first few months teaches you things a dashboard hides. You see the actual wording of the answers. You notice that one engine consistently misunderstands what your company does. You catch a competitor’s exact phrasing that is winning. You develop a feel for the difference between meaningful change and regeneration noise. A tool gives you a number; doing it by hand gives you understanding, and understanding is what tells you whether a number matters.

There is also a trust issue. Tools approximate. They sample, they interpret, they sometimes measure a slightly different thing than they claim. If you have run the manual ledger first, you can sanity-check a tool’s output against your own eyes and know whether to believe it. If you start with the tool, you are trusting a black box to measure something you have never observed directly. Begin with the manual ledger. Add a tool when the manual work becomes the bottleneck, not before.

Turn the ledger into action

Tracking is worthless if it does not change what you do. The ledger exists to drive three kinds of action.

When presence is low or falling on a query that matters, the action is content and authority work. The engine is not selecting you because it does not have enough trustworthy material connecting your brand to that question. That means publishing clear, specific content that answers the question directly, and earning third-party mentions that corroborate it. When presence is fine but sentiment is weak, the action is correction. The engine is repeating an outdated or wrong framing, often pulled from stale pages or old third-party content, and the fix is updating your own material and the sources the engine leans on. When citation is missing, the action is making your content more quotable: direct answers, clear structure, specific claims an engine can lift with confidence.

Above all, let the ledger set priority. You cannot fix everything, so fix the queries where the gap between their revenue importance and your visibility is widest. A query you score badly on but that buyers rarely ask can wait. A category recommendation prompt your best customers use, where a competitor is named and you are absent, is this month’s whole job. The ledger does not just tell you that you have an AI search visibility problem. Read with the four columns and the four layers, it tells you which problem to solve first, and that focus is the real return on the work.