How Intent Data Is Collected and Scored

By Priya Nair · 2026-06-14 · 11 min read

A technical look at how intent data providers collect signals — bidstream, co-ops, reverse-IP, panels — and the scoring math behind intent scores.

Editorial illustration of glowing green signal waves and scattered data points converging into a rising surge curve on a deep navy background, representing how raw signals become an intent score.

Intent data is collected by watching where research happens — on publisher networks, in programmatic bidstream telemetry, through opt-in panels, and via reverse-IP and identity resolution — then it is scored by comparing each account's current topic activity against its own baseline. The "intent score" or "topic surge" you see is rarely a single number off a single source: it is a pipeline that maps raw web events to topics, weights them by recency and volume, measures how many standard deviations this week's activity sits above normal, and (in newer systems) feeds the whole thing to a model that predicts purchase probability. Understanding that pipeline is the only way to judge whether a vendor's score is signal or theater.

The Four Ways Intent Data Is Collected

Every provider's accuracy, freshness, and compliance profile is decided at the collection layer. There are four dominant mechanics, and most vendors blend several.

Publisher co-ops (data consortiums). A network of B2B sites tags their content by topic and shares anonymized consumption back to a central pool. When a known company reads three articles on "data warehouse migration," that consumption becomes a topic signal. Accuracy is decent because the content is explicitly topic-labeled, but coverage is skewed toward whichever publishers joined the co-op.
Bidstream telemetry. Every time a page loads a programmatic ad, the real-time bidding request broadcasts the URL, page context, IP, and device data to dozens of ad exchanges. Providers tap that firehose and infer topic interest from the pages an IP loaded. The volume is enormous and near-real-time, but the quality is noisier and regulatory scrutiny of bidstream collection is rising sharply.
Reverse-IP and identity resolution. Anonymous web traffic is mapped back to a company (reverse-IP) or, more aggressively, to a person (cookie, device graph, hashed-email match). Reverse-IP is reliable for large enterprises with stable IP ranges and weak for remote workers and small firms behind consumer ISPs. Person-level resolution is the most fragile and the most compliance-sensitive link in the chain.
Opt-in research panels. A relatively small set of professionals agree to have their browsing tracked. Their behavior is then extrapolated to a much larger universe of look-alike accounts — which is exactly where a large share of false positives is born.

The practical takeaway: ask any vendor which of these they own versus license, because a reseller stacking three upstream panels has a very different freshness and accuracy story than a platform built on first-party or observable public signals. We unpack the buyer-side of that question in our guide to intent data providers.

From Raw Events to Topics: The Processing Layer

Collected events are not yet "intent." A bidstream hit is just an IP and a URL; a co-op record is a hashed company and a page. Turning that into "Account X is researching Topic Y" takes several processing steps that quietly determine how trustworthy the final score is.

Identity stitching. The raw event is resolved to an account (and sometimes a person). This is where match-rate and match-confidence live — a 40% match rate means most of the firehose never becomes usable signal.
Topic classification (NLP). The content the account engaged with is run through natural-language processing — keyword extraction, entity recognition, and increasingly transformer-based classifiers — to assign it to one or more topics in a fixed taxonomy. A page about "Snowflake pricing" maps to topics like data warehousing and cloud cost management.
Topic clustering. Related keywords and pages are grouped so that ten different long-tail searches all roll up to the same buyable theme. Good clustering is what lets a provider report B2B keyword intent data at a granularity reps can act on, instead of vague category surges.
Deduplication and bot filtering. The same account loading the same page from five offices, plus crawler and bot traffic, must be collapsed and stripped or the volume counts become meaningless.

Only after this pipeline produces a clean, per-account, per-topic time series can scoring begin. Every weakness upstream — a bad match, a mis-classified page, an unfiltered bot — propagates straight into the score.

The Scoring Math: Baseline, Surge, and Lift

The core idea behind almost every intent score is deviation from baseline, not raw volume. Here is the math most providers run, stripped of marketing language.

Baseline modeling. For each account and topic, the provider computes what "normal" research volume looks like — typically a rolling average over a trailing window (say 8–12 weeks), sometimes seasonally adjusted. This baseline is the single most important and least-discussed part of the system. Without it, a large company looks permanently "in-market" simply because it generates more traffic than a small one.

Surge detection via standard deviations (z-score). This week's observed volume is compared to the baseline and expressed as a lift — a common formulation is a z-score:

z = (observed_this_week − baseline_mean) / baseline_std_dev

An account whose current activity sits two or three standard deviations above its own historical mean is "surging." Because the comparison is against the account's own baseline, a mid-size firm doubling its normal research can outrank an enterprise running at its usual high volume — which is the entire point.

Recency decay. Intent is perishable, so recent events count for more. Providers apply a decay function — often exponential — so a signal's weight halves over a fixed half-life:

weight(t) = e^(−λ · age_in_days)

A page read today contributes its full value; one from three weeks ago contributes a fraction. This is why a surge that looked hot in last month's batch is effectively background context today.

Volume and source weighting. Not all events are equal. A documented, topic-tagged co-op read is weighted more heavily than a noisy bidstream inference; a deep engagement (a long dwell, a whitepaper download) counts more than a single bounce. Many models also weight by the number of distinct researchers at an account, since five people researching a topic is a stronger buying-committee signal than one.

The final intent score is usually a normalized blend of these factors — surge magnitude × recency weight × source/volume weight — mapped onto a 0–100 scale or an A/B/C/D grade. The grade hides the math, which is why you should always ask to see the components. The same deviation-from-baseline logic underpins good predictive lead scoring, and the discipline of weighting signals in lead scoring applies directly to how much trust each intent source should earn.

Rules-Based Thresholds vs. Machine-Learning Models

Two philosophies determine how the scored components become a final verdict, and vendors are quietly migrating from the first to the second.

Rules-based scoring uses human-set thresholds: flag the account if its z-score exceeds 2.0 on a priority topic in the last 14 days. It is transparent, easy to audit, and easy to explain to a skeptical rep — but it is brittle. Thresholds drift out of date, and a fixed cutoff treats every topic and industry identically.

Machine-learning scoring trains a model (gradient-boosted trees and logistic regression are common; some vendors use neural nets) on historical outcomes — which surging accounts actually became pipeline — so the model learns the topic combinations, sequences, and magnitudes that correlate with real buying. It adapts and usually predicts better, but it is a black box: when a rep asks "why is this account a 92?", the honest answer is "the model said so."

The trade-off is transparency versus adaptive accuracy. The most useful systems keep the ML score and expose the underlying components, so a human can sanity-check the machine. A score you cannot interrogate is a score reps will eventually ignore. For a topic-by-topic walk-through of turning surges into rep action, see how to prioritize buying signals for outbound.

Where the Score Goes Wrong

Even a mathematically sound pipeline produces bad scores when the inputs are weak, and the failure modes are predictable.

Baseline contamination. If the baseline window includes a previous surge, the model normalizes against an inflated mean and under-reports the next real spike.
Extrapolation error. Panel-based providers infer the behavior of thousands of accounts from a few tracked panelists. One mis-weighted panelist can light up an entire account that never researched anything.
Topic mis-classification. An analyst, a job seeker, a student, or a competitor's campaign can all trip the same topic. The math says "surge"; the reality is a non-buyer.
Stale delivery. A model can be perfect and still useless if the feed arrives in weekly batches with multi-day lag, so reps act on research that decayed away two weeks ago.

This is why a raw topic surge should be treated as one input, never a standalone targeting list — corroborate it with a verified contact and a discrete, dated event before a rep ever dials. The full range of clues to weigh against a surge is laid out in the field guide to B2B intent signals.

How Lead Seeker Approaches the Same Problem

Lead Seeker is deliberately built on observable public signals — hires, funding rounds, job postings, leadership changes, tech-stack moves — rather than a probability index extrapolated from bidstream and panels. The scoring difference is fundamental: a funding announcement or a posted role is a discrete, timestamped, source-backed event, not a smoothed statistical inference.

Auditable, not extrapolated. Each signal in a Prospect Dossier links to the underlying evidence, so a rep verifies the event instead of trusting a colored grade.
Freshness is a fact, not a batch schedule. Public events carry their own timestamps, so recency is observed rather than estimated.
Lower false positives. A discrete event either happened or it didn't — there is no panel extrapolation to inflate it.

It is a different bet than a topic-surge feed: fewer, higher-confidence signals you can stand behind, blended with verified contacts and ICP fit. Browse more intent data insights for the wider picture, or review our transparent monthly pricing to model the economics.

Frequently Asked Questions

How do intent data providers collect intent data?

Providers collect intent data four main ways: publisher co-ops (networks of B2B sites sharing topic-tagged content consumption), bidstream telemetry (topic inferences pulled from programmatic ad-bid requests), reverse-IP and identity resolution (mapping anonymous traffic back to a company or person), and opt-in research panels (a small tracked group extrapolated to a larger universe). Most vendors blend several, and the mix determines accuracy, freshness, and compliance risk.

How is an intent score actually calculated?

An intent score is built by establishing a baseline of normal research volume for each account and topic, measuring how far current activity sits above that baseline (often as a z-score, or standard deviations of lift), then weighting the result by recency decay, event volume, and source quality. Those components are blended and normalized onto a 0–100 scale or an A–D grade. The grade hides the math, so ask any vendor to show the underlying components.

What is a topic surge and how is it measured?

A topic surge is a statistically significant rise in an account's research on a topic relative to its own historical baseline — not just a high absolute volume. It is typically measured as a z-score: this week's observed volume minus the baseline mean, divided by the baseline standard deviation. An account two or three standard deviations above its own norm is surging, which lets a mid-size firm doubling its activity outrank an enterprise running at its usual high volume.

Do intent data providers use machine learning or rules to score intent?

Both, and vendors are shifting from rules toward machine learning. Rules-based scoring uses human-set thresholds (for example, flag any account above a z-score of 2.0 on a priority topic) — transparent and auditable but brittle. Machine-learning scoring trains models on which past surges became real pipeline, so it adapts and usually predicts better, but it is harder to interrogate. The strongest systems keep the ML score while still exposing the components behind it.

How does recency decay change an intent score?

Recency decay reduces the weight of older events, usually with an exponential function so a signal's contribution halves over a fixed half-life. A page read today counts at full value; one from three weeks ago counts for a fraction. This reflects how perishable intent is — surges lose most of their predictive value within 7–14 days — which is why a feed that delivers in slow weekly batches can hand you "intent" that has already decayed away.

Why does the baseline matter more than the raw signal volume?

Because intent is about deviation from normal, not absolute activity. Without a per-account baseline, large companies look permanently in-market simply because they generate more traffic than small ones. The baseline calibrates each account against its own history, so the score reflects a genuine change in behavior. A poorly built baseline — too short, or contaminated by a prior surge — quietly corrupts every score downstream, which is why it is the first thing to scrutinize in any provider.

Next Steps

If you want to judge scoring quality for yourself rather than read a methodology deck, look at how a source-backed event appears in a Prospect Dossier and compare it to the colored grade a topic-surge feed hands you. The fastest way to tell a real intent score from an expensive one is whether you can trace it back to the event that created it.