Research and Decision Science/Data glossary/Campaign Sign-up Rate
This document was created as part of Signals and Data Services Objective 2 Key Result (KR) 2 from the Wikimedia Foundation's 2024-2025 Annual Plan. The KR focused on developing a set of product and experimentation-focused essential metrics. This document defines a metric for measuring the rate of users signing up for a campaign, with application as a secondary metric for Growth's Community Updates module experiment. This metric is an implementation of a conversion rate metric.
Glossary
edit- Unit
- A single source of related interactions such as a session, user, or install.
- Impression
- When an element (such as a link or a button) is loaded and shown to the user.
- Conversion
- When the user takes the desired action to convert themselves to the desired state (e.g. registering an account converts a reader to a registered user)
- CSR
- Campaign Sign-up Rate, usually represented as a percentage (rather than value 0–1)
- Homepage
- The Newcomer Homepage, available on Special:Homepage on every Wikipedia wiki by default for an account registered on that wiki.
Metric definition
editCampaign sign-up rate is an example of a conversion rate metric. These metrics always take the form of a fraction. The numerator in this fraction is based on the conversion action, an action the user took to convert into a desired state (e.g. signing up for a campaign). The denominator, hereby called impression action, is an action signifying that someone was shown an intervention or call to action, meaning that they are part of a group that we are interested in converting. The metric can further be modified by requiring that these actions occur within some amount of time. It can also be modified by calculating this for different units of measurement. We will define this in more detail below.
Represented as a mathematical formula, the basic form of the metric is as follows:
Community Updates module experiment definition
editFor the use of "Campaign sign-up rate" in the Growth team's experiment with the Community Updates module in Q2 FY 24/25, we define the metric as follows:
- The conversion action is signing up for a campaign.
- The impression action is visiting the Newcomer Homepage.
- We require conversion to occur within 1 hour, this is a typical session-length limit used in web analytics.[1]
The unit of measurement is a user, meaning that each user only counts once towards the result even if they signed up for more than one campaign, or visited the Homepage multiple times.
This translates into the following formula for Campaign Sign-up Rate (CSR):
We can see a visual representation of how this works in Figure 1. The diagram shows five users on a wiki during an imaginary six-hour experiment. There is a seventh hour because the session of User 4 lasts into that hour. Two users, User 1 & User 2, are converted because they signed up for a campaign within the one-hour window. User 3 is not converted even though they signed up, because that sign-up occurred outside the window. User 4 never signed up and is therefore not converted. User 5 never visited the Homepage and is therefore not part of the calculation of the metric. From the formula we then get a CSR of 50% (2/4 = 0.5)
We choose the 1-hour limit on conversion in order to establish a link between the exposure and the subsequent sign-up for those users who see the Community Updates module. For users in the control group, who do not see the module, requiring them to visit the Homepage in combination with this 1-hour limit defines a notion of them "being active on the wiki" when a sign-up might occur. Taken together, we end up with similar restrictions for both the treatment and control groups in the experiment.
Choosing the user as the unit of measurement is done to not have each Homepage visit affect the metric as it would be unreasonable to expect a user to visit the Homepage and be affected by the Community Updates module's call to action every time. With these choices we aim to measure the impact of the Community Updates module over time.
This metric is only defined for logged-in users because an account is required for both visiting the Newcomer Homepage and for signing up to a campaign.
Other approaches
editWhile the definition for the Growth team's experiment uses a specific unit of measurement, the metric is not restricted to that unit. What unit one chooses depends on needs. Here are some observations about different units of measurement:
- Per-user, where a single user only counts once.
- Per-session, where every session/visit counts.
Using the user as the unit of measurement is useful when users can sign up to multiple campaigns, or when there is a difference in expected outcome for a finer-level unit. For example, in the case of the Community Updates module experiment, we do not expect users to be equally interested in signing up for a campaign on every visit to the Homepage, particularly if someone is highly active and visits the Homepage a lot. If we did not calculate the metric on a per-user basis, the visits of highly active users would have an undue influence on the result.
Using the session as the unit of measurement is meaningful when the number of sessions per user is low, where the call to action changes with every session, or where a user can sign up to multiple campaigns. In the latter case for example, we want to make sure that a user who signs up to multiple campaigns is counted multiple times.
Rates can also be measured for a single campaign or across multiple campaigns. In the latter case, to measure an average sign-up rate, one can either average the rate across all campaigns, or calculate a weighted average. If the campaigns are roughly equal in size, taking the average across them is a straightforward way to calculate the average. In cases where the size of the campaigns are very different, defining a unit of measurement for these campaigns (e.g. per-user) and calculating the rate across all units for all campaigns enables larger campaigns to have more influence on the result.
Data aggregation
editGeneral case
editIn the general case, aggregating data and calculating the conversion rate consists of identifying all impression actions, all conversion actions, and calculating the proportion. In SQL this would follow the framework outlined below:
WITH impressions AS (
SELECT
pageview_id,
1 AS was_impressed
FROM {table}
-- define the timeframe of data gathering
WHERE {Condition of timeframe}
-- define the impression action
AND action = '{impression action name}'
),
conversions AS (
SELECT
pageview_id,
1 AS was_converted
FROM impressions
LEFT JOIN {table} AS convert_action
ON impressions.pageview_id = convert_action.pageview_id
-- define the timeframe of data gathering
WHERE {Condition of timeframe}
-- define a conversion window, if needed
AND {Condition of return window}
-- define the conversion action
AND action = '{conversion action name}'
)
SELECT
SUM(IF(was_converted, 1, 0)) / SUM(was_impressed)
FROM conversions
Community Updates module experiment
editThe data gathering and calculation of Campaign Sign-up rate for the Community Updates experiment is different from the general case because it requires data from three different sources: impression events from the Newcomer Homepage instrumentation, campaign sign-up data from central storage, and a mapping from local wiki user ids to global user ids from central authentication. We therefore end up with three different queries, and the calculation requires some additional processing in order to merge the data. You can find a prepared example calculation in this Jupyter notebook.
The general structure of the three queries is as follows.
Step 1: Gather visits from the Newcomer Homepage
editThis query can be run using Spark or Presto depending on your preferences:
SELECT
wiki,
event.user_id,
dt AS visit_dt
FROM event.homepagevisit
-- define the timeframe of data gathering
WHERE {Condition of timeframe}
-- define a set of wikis to gather data from, if needed
AND wiki IN ({comma-separated list of wikis})
Step 2: Convert local user ids to global user ids
editThis query needs to be run using the "centralauth" MariaDB database. In order to make the query easier to read and work with, it does this conversion for a list of users from a specific wiki, and will therefore need to be run for every wiki.
SELECT
lu_wiki AS wiki,
lu_local_id AS user_id,
lu_global_id AS global_user_id
FROM localuser
-- define the wiki we're converting users for
WHERE lu_wiki = "{wiki database name}"
-- define the list of local user ids we are converting
AND lu_local_id IN ({comma-separated list of local user ids})
Step 3: Look up campaign sign-ups
editThis query needs to be run using the "wikishared" MariaDB database on the "x1" database server. In order to keep the amount of data queried small, it requires a list of global user ids, which is what we gathered in Step 2.
SELECT
cep_user_id,
cep_registered_at
FROM ce_participants
-- define the list of global user ids
-- that we want to identify having signed up for a campaign
WHERE cep_user_id IN ({comma-separated list of global user ids})
-- define a specific campaign if needed
AND cep_event_id = {event id}
Step 4: Calculate the metric
editTo calculate the metric, Step 4 is to merge the results from Step 1 and Step 2 to get a list of impressions that also contains global user ids. Step 5 is to merge the list from Step 4 with the result from Step 3 in order to get a list of users who signed up. It is at this point that we can enforce the 1-hour time window between visit and signup. We then calculate the metric by counting the number of users from Step 5 and dividing it by the number of users in Step 1.
Planning experiments
editConducting a power analysis – determining a minimum sample size required to detect a certain size of an effect with a certain statistical power – can be done as with overall clickthrough rate (CTR), which is also a proportion metric.
For example, the CSR in the Community Updates module experiment was less than 5%.[2] If we wanted to test an intervention that we think would improve CSR from, say, 4% to 5% (a 1pp increase) and we wanted to be able to detect at least that improvement with 90% statistical power and 5% significance level, we would need at least 7339 impressions on the page where the sign-up can occur. See the R code below for calculation with the pwr package:
library(pwr)
pwr.2p.test(
h = ES.h(0.05, 0.04), # Cohen's h as the effect size
power = 0.9,
alternative = "greater" # we would test that p1 > p2, not p1 != p2
)
References
edit- ↑ Halfaker, Aaron; Keyes, Os; Kluver, Daniel; Thebault-Spieker, Jacob; Nguyen, Tien; Grandprey-Shores, Kate; Uduwage, Anuradha; Warncke-Wang, Morten (2015-05-18). "User Session Identification Based on Strong Regularities in Inter-activity Time". Proceedings of the 24th International Conference on World Wide Web. WWW '15 (Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee): 410–418. ISBN 978-1-4503-3469-3. doi:10.1145/2736277.2741117.
- ↑ "Growth/Community Updates". MediaWiki. 2024-02-23. Retrieved 2025-01-10.