Research and Decision Science/Data glossary/Clickthrough Rate
This document was created as part of Signals and Data Services Objective 2 Key Result (KR) 2 from the Wikimedia Foundation's 2024-2025 Annual Plan. The KR focused on developing a set of product and experimentation-focused essential metrics. This document defines three (3) ways to measure clickthrough rate (CTR), each with its own scenario/use-case where it is better suited than others for measuring click-based user engagement with a feature or element of the user interface (e.g. link or button): overall CTR, average CTR, and unique CTR. This document also provides example queries (written in Spark SQL and Presto) and specifies some requirements and recommendation regarding the implementation (measurement) of the metrics, such as taking into consideration element visibility. Finally, this document also recommends methodologies for analyzing the CTRs in the context of measuring baselines with confidence intervals and evaluating experiments (A/B tests).
Glossary
edit- Unit
- A single source of related interactions such as a session, user, or install.
- Impression
- When an element (such as a link or a button) is loaded and shown to the user.
- Click
- When the user clicks on the shown element, including tapping on a touch screen.
- Engagement
- When the user interacts with content in ways that include clicking but are not exclusively clicking, such as typing text, scrolling, or hovering over elements.
- CTR
- Clickthrough rate, usually represented as a percentage (rather than value 0–1)
Metric definitions
editWe use three different definitions/notions of a clickthrough rate (CTR) for interface elements and features.
There are other contexts where we are interested in more specific CTRs, such as when the user is searching for content and is shown search results or recommendations that they can click on or is presented with additional search features that they can interact with.
Overall clickthrough rate
editOr just "clickthrough rate":
This is the simplest one to track and calculate and there is no need for a unit identifier (e.g. session ID, app install ID, user ID). You simply count the total number of click events recorded and divide it by the count of impression events recorded:
SELECT
SUM(IF(action = 'click', 1, 0)) / SUM(IF(action = 'impression', 1, 0))
WHERE action IN('click', 'impression')
This metric is very sensitive to the presence of automated agents (bots) in the data. An inflation of impressions or an inflation of both impressions and clicks will substantially affect the estimates. When possible to include additional information about the unit (such as session ID or install/user ID), the average clickthrough rate is preferred.
Average clickthrough rate
editIn some cases we may want to measure CTR that is more robust to potential automated agent (bot) activity that may:
- generate substantially more impressions than click-throughs (e.g. none), which would cause us to underestimate the actual CTR if using the overall method
- generate a perfect, one-to-one ratio of impressions to click-throughs, which would cause us to overestimate the actual CTR if using the overall method
We can account for this potential behavior by calculating the CTR at unit level (such as session-by-session or user-by-user) and then calculate the average of those per-unit CTRs. This makes for a substantially more resilient/robust measurement when the data contains bots.
-- Average across sessions:
WITH per_session_ctrs AS (
SELECT
session_id,
SUM(IF(action = 'click', 1, 0)) / SUM(IF(action = 'impression', 1, 0)) AS ctr
WHERE action IN('click', 'impression')
GROUP BY session_id
)
SELECT
AVG(ctr)
FROM per_session_ctrs
The proportion of automated agents to non-automated agents will impact how influential their individual CTRs are in the calculation of the group CTR, and the hope is that the volume of non-automated agents vastly outweighs the volume of automated ones.
Unique clickthrough rate
editIn some cases we may not care about the total number of times content was seen and then clicked on, but rather how many sessions or users saw the content and then what proportion of them clicked on it after seeing it, like the first stage of a multi-stage funnel.
It's especially useful in a conversion scenario where the user can convert only once per unit's lifetime, such as registering a new account or signing up to participate in an organized event. So we may, for example, be interested in capturing the notion of "how many users demonstrate intent to sign up for an account" by measuring unique CTR on a Create Account link/button.
This is fundamentally different from both the overall CTR and the average CTR.
WITH sessions_that_clicked AS (
SELECT DISTINCT
session_id,
true AS clicked
WHERE action = 'click'
), sessions_that_saw AS (
SELECT DISTINCT session_id
WHERE action = 'impression'
), sessions AS (
SELECT
i.session_id,
COALESCE(clicked, false) AS click_through
FROM sessions_that_saw i
LEFT JOIN sessions_that_clicked c
ON i.session_id = c.session_id
)
SELECT
SUM(IF(click_through, 1, 0)) / COUNT(1)
FROM sessions
Measuring
editData collection (instrumentation)
editData contract
editThis section assumes the instrumentation is done using the Metrics Platform. |
In addition to action
, the instrument needs to record element_friendly_name
as we are interested in calculating clickthrough rate (whether overall, average, or unique) for specific elements (such as a particular link or button) shown to the user.
If the user is enrolled in any experiments (A/B tests), the instrument needs to record that information in experiments
(cf. T368326).
Element visibility
editWhenever possible "impression" should correspond to content actually shown to the user (e.g. the DOM element is visible in the browser viewport; View is visible on Android). This is useful for buttons/links that are below the fold and thus may be clicked on substantially less simply due to the fact that they are actually seen less.
An "impression" event should not automatically fire upon the instrument's initialization (which should emit an "init" event instead) but rather only if the attached UI element is actually visible.
Example: If there are 10 elements on a page with CTR instrumentation attached and 8 are visible right away and 2 will be visible if the user scrolls down, then there would be 8 impression events right away and 2 impression events if the user scrolls down.
If we do not implement this requirement it does hurt the accuracy/reliability of CTR measurements and makes it impossible to compare CTR for elements above the fold vs elements below the fold. However, in an A/B test setting this requirement matters less because if the instrumentation is flawed (in that it does not account for visibility of element when emitting an impression event) then as long as it is consistently and universally flawed (across all experiment subjects/groups) then we can still measure differences in CTR between experiment groups.
Standalone instrumentation
editClicks and impressions must come from the same instrument. We should not combine clicks from one instrument and impressions from another instrument.
If we have JavaScript code that sends a "click" event when a link or a button on a page is clicked, we should not use the Webrequest-based count of pageviews for that page as our count of impressions. Instead the JavaScript code should send an "impression" event. This is particularly important because server-side instrumentation of impressions is immune to No-JS (JavaScript disabled in browser) and to ad/tracker-blockers which can and do block intake-analytics.wikimedia.org.
We also should not rely on multiple instruments because those instruments can be loaded and behave differently from one another. If we have a modular, instrumented feature and each module is instrumented separately, each module's instrument should still record impressions even if the overall feature's instrument already records impressions.
If there are multiple elements on the page (or screen) that we attach a clickthrough tracking instrument to, we would send an impression event for each of those elements. We would not "re-use" impression events.
No-JS support
editIt should be noted that for technical reasons we would not be able to instrument server-side click-throughs, only client-side. This means that on the web we would not be able to measure CTR among users who have JavaScript disabled or to robustly measure CTR for links/buttons even in the presence of an ad/tracker-blocker.
Note: we already have plenty of features that require JS (e.g. if they're using Codex for UI), so it is reasonable to focus on insights about users who have JS enabled and exclude a very small proportion of users who don't. Refer to No-JavaScript notes for more information.
Example queries
editThis section provides queries written in Spark SQL and Presto dialects/implementations of SQL. The queries use a demo table called user_events
with simulated data that looks like:
user_id (string) | assigned (string) | session_id (string) | event_id (string) | action (string) |
---|---|---|---|---|
9d607a663f3e9b0a90c3c8d4426640dc | a | 02a8effc4e091311 | 0616f417fa837c00067ca0 | impression |
9d607a663f3e9b0a90c3c8d4426640dc | a | 02a8effc4e091311 | 8ee22e2c2f64ca97275c3 | click |
894f782a148b33af1e39a0efed952d69 | b | 9a8fc1bc918b49ce | 213c9131ba603cc645663 | impression |
Where assigned
is a value that would be in the experiments.assigned
array in actual data collected with a Metrics Platform-based instrument.
Overall CTR
editSpark SQL |
SELECT
assigned,
SUM(IF(action = 'click', 1, 0)) / SUM(IF(action = 'impression', 1, 0)) AS overall_ctr
FROM user_events
WHERE action IN('click', 'impression')
GROUP BY assigned
|
Presto |
SELECT
assigned,
CAST(SUM(IF(action = 'click', 1, 0)) AS DOUBLE) / SUM(IF(action = 'impression', 1, 0)) AS overall_ctr
FROM user_events
WHERE action IN('click', 'impression')
GROUP BY assigned
Note the use of |
Average CTR
editThese example queries use session as the analysis unit, but other units such as install or user can be used if needed and if appropriate identifiers/tokens are available.
Spark SQL |
WITH per_session_ctrs AS (
SELECT
assigned,
session_id,
SUM(IF(action = 'click', 1, 0)) / SUM(IF(action = 'impression', 1, 0)) AS ctr
FROM user_events
WHERE action IN('click', 'impression')
GROUP BY assigned, session_id
)
SELECT
assigned,
AVG(ctr) AS average_ctr
FROM per_session_ctrs
GROUP BY assigned
|
Presto |
WITH per_session_ctrs AS (
SELECT
assigned,
session_id,
CAST(SUM(IF(action = 'click', 1, 0)) AS DOUBLE) / SUM(IF(action = 'impression', 1, 0)) AS ctr
FROM user_events
WHERE action IN('click', 'impression')
GROUP BY assigned, session_id
)
SELECT
assigned,
AVG(ctr) AS average_ctr
FROM per_session_ctrs
GROUP BY assigned
|
Unique CTR
editThese example queries use session as the analysis unit, but other units such as install or user can be used if needed and if appropriate identifiers/tokens are available.
Spark SQL |
WITH sessions_that_clicked AS (
SELECT DISTINCT
assigned, session_id,
true AS clicked
FROM user_events
WHERE action = 'click'
), sessions_that_saw AS (
SELECT DISTINCT assigned, session_id
FROM user_events
WHERE action = 'impression'
), sessions AS (
SELECT
i.assigned,
i.session_id,
COALESCE(clicked, false) AS click_through
FROM sessions_that_saw i
LEFT JOIN sessions_that_clicked c
ON i.assigned = c.assigned
AND i.session_id = c.session_id
)
SELECT
assigned,
SUM(IF(click_through, 1, 0)) / COUNT(1) AS unique_ctr
FROM sessions
GROUP BY assigned
|
Presto |
WITH sessions_that_clicked AS (
SELECT DISTINCT
assigned, session_id,
true AS clicked
FROM user_events
WHERE action = 'click'
), sessions_that_saw AS (
SELECT DISTINCT assigned, session_id
FROM user_events
WHERE action = 'impression'
), sessions AS (
SELECT
i.assigned,
i.session_id,
COALESCE(clicked, false) AS click_through
FROM sessions_that_saw i
LEFT JOIN sessions_that_clicked c
ON i.assigned = c.assigned
AND i.session_id = c.session_id
)
SELECT
assigned,
SUM(IF(click_through, 1, 0)) / CAST(COUNT(1) AS DOUBLE) AS unique_ctr
FROM sessions
GROUP BY assigned
CAST(... AS DOUBLE) when dividing the count of clicks by count of impressions. Without it the CTR ends up being 0.
|
Analysis recommendations
editBaseline estimation
editConfidence intervals (CIs) for both overall CTR and unique CTR can be calculated using binomial proportion confidence interval method:
where is the sample proportion (the CTR) and is the quantile of a standard normal distribution. For example, for a 95% CI ( ), would be 1.96. In the case of overall CTR, is the total number of impressions. In the case of unique CTR, is the total number of units where at least 1 impression was recorded.
The CI for average CTR can be calculated via:
where is the sample mean (average CTR), is the sample standard deviation (calculated from per-unit CTRs, just like the sample mean), is the total number of units (e.g. sessions) used in the calculation, and is the quantile of a standard normal distribution (same as above).
Planning experiments
editSince unique CTR is also just a proportion just like overall CTR, the power analysis would be the same. In both cases we would use the standard effect size for proportions – Cohen's h.
For a change in overall CTR from 10% to 11%, Cohen's h would be 0.0326. To estimate the sample size we need to reach 80% power (with 5% significance level):
pwr.2p.test(
h = 0.0326,
power = 0.8,
alternative = "greater"
)
We would need at least 11,635 impressions to be able to detect a minimum improvement of 1pp.
Tip: In practice we would be dealing with very, very small clickthrough rates and correspondingly very small improvements – and thus, very small effect sizes. For example, the effect size – as measured by Cohen’s h – between 0.001% CTR and 0.0011% CTR (a 10% improvement or 0.0001pp increase) is 0.0031. The number of impressions required in this case would be 1.30 M.
Average CTR, on the other hand, uses Cohen's d as the effect size. Suppose the baseline average CTR is 5.8% (with 4.8% standard deviation). To a detect a 0.1pp increase (a 1.72% relative increase). – and assuming the same standard deviation in the treatment group (the one receiving a different experience), we would want to detect a Cohen's d of 0.02083.
pwr.t.test(
d = ((0.059 - 0.058) / 0.048),
power = 0.95,
alternative = "greater"
)
To detect that effect size with 95% power we would need about 50K sessions per group in the experiment.
Evaluating experiments
editStatistical analysis examples are coded in R. |
In these examples we are analyzing 3 groups:
- A
- First variant whose true, latent overall CTR is 15% (a 5pp increase over the baseline).
- B
- Second variant whose true, latent overall CTR is 11% (a 1pp increase over the baseline).
- C
- Control group whose true, latent overall CTR is 10%.
Analyzing proportions
editSince unique CTR is also just a proportion just like overall CTR, the statistical analysis would be the same.
First, we can test whether there is at least one group that is different from the others:
# clicks and impressions are vectors of length 3
prop.test(
x = clicks,
n = impressions
)
3-sample test for equality of proportions without continuity correction data: clicks out of impressions X-squared = 111721, df = 2, p-value < 2.2e-16 alternative hypothesis: two.sided sample estimates: prop 1 prop 2 prop 3 0.10641417 0.15366222 0.09478002
We reject . Then we compare A vs C, B vs C, and A vs B:
Important: Since we are making multiple comparisons (including the one we already did), we should use Bonferroni correction, meaning that whatever we decide to use we would divide by hypotheses. So gives us a new to use.
# clicks and impressions are named vectors of length 3
pairwise.prop.test(
x = clicks,
n = impressions,
p.adjust.method = "bonferroni"
)
Pairwise comparisons using Pairwise comparison of proportions data: clicks out of impressions b a a <2e-16 - c <2e-16 <2e-16
Analyzing averages
editUsing ANOVA to test whether all three group means are equal to each other:
# session_ctrs is a data frame of per-session CTRs
session_ctrs <- session_ctrs |>
mutate(assigned = factor(assigned, c("a", "b", "c"))) |>
mutate(assigned = relevel(assigned, ref = "c"))
fit <- lm(ctr ~ assigned, data = session_ctrs)
anova(fit)
Analysis of Variance Table Response: ctr Df Sum Sq Mean Sq F value Pr(>F) assigned 2 13.691 6.8456 2383.2 < 2.2e-16 *** Residuals 26476 76.049 0.0029 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We reject . Then we compare A vs C, B vs C, and A vs B. For demonstration purposes, only the first pairwise comparison (A vs C) is included:
# Is the average CTR in variant A better than the average CTR in the control group?
t.test(
x = session_ctrs |>
filter(assigned == "a") |>
pull(ctr),
y = session_ctrs |>
filter(assigned == "c") |>
pull(ctr),
alternative = "greater"
)
Welch Two Sample t-test data: pull(filter(session_ctrs, assigned == "a"), ctr) and pull(filter(session_ctrs, assigned == "c"), ctr) t = 65.094, df = 16565, p-value < 2.2e-16 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 0.0525283 Inf sample estimates: mean of x mean of y 0.11149323 0.05760311
Note: alternative = "greater"
is the alternative that x
has a larger mean than y
, which is why we are setting y
as the control group’s clickthrough rates.