Research and Decision Science/Data glossary/Campaign Sign-up Rate

This document was created as part of Signals and Data Services Objective 2 Key Result (KR) 2 from the Wikimedia Foundation's 2024-2025 Annual Plan. The KR focused on developing a set of product and experimentation-focused essential metrics. This document defines a metric for measuring the rate of users signing up for a campaign, with application as a secondary metric for Growth's Community Updates module experiment. This metric is an implementation of a conversion rate metric.

Glossary

edit
Unit
A single source of related interactions such as a session, user, or install.
Impression
When an element (such as a link or a button) is loaded and shown to the user.
Conversion
When the user takes the desired action to convert themselves to the desired state (e.g. registering an account converts a reader to a registered user)
CSR
Campaign Sign-up Rate, usually represented as a percentage (rather than value 0–1)
Homepage
The Newcomer Homepage, available on Special:Homepage on every Wikipedia wiki by default for an account registered on that wiki.

Metric definition

edit

Campaign sign-up rate is an example of a conversion rate metric. These metrics always take the form of a fraction. The numerator in this fraction is based on the conversion action, an action the user took to convert into a desired state (e.g. signing up for a campaign). The denominator, hereby called impression action, is an action signifying that someone was shown an intervention or call to action, meaning that they are part of a group that we are interested in converting. The metric can further be modified by requiring that these actions occur within some amount of time. It can also be modified by calculating this for different units of measurement. We will define this in more detail below.

Represented as a mathematical formula, the basic form of the metric is as follows:

 

Community Updates module experiment definition

edit

For the use of "Campaign sign-up rate" in the Growth team's experiment with the Community Updates module in Q2 FY 24/25, we define the metric as follows:

  • The conversion action is signing up for a campaign.
  • The impression action is visiting the Newcomer Homepage.
  • We require conversion to occur within 1 hour, this is a typical session-length limit used in web analytics.[1]

The unit of measurement is a user, meaning that each user only counts once towards the result even if they signed up for more than one campaign, or visited the Homepage multiple times.

This translates into the following formula for Campaign Sign-up Rate (CSR):

 

 
Figure 1: Visual representation of conversions during a six-hour experiment

We can see a visual representation of how this works in Figure 1. The diagram shows five users on a wiki during an imaginary six-hour experiment. There is a seventh hour because the session of User 4 lasts into that hour. Two users, User 1 & User 2, are converted because they signed up for a campaign within the one-hour window. User 3 is not converted even though they signed up, because that sign-up occurred outside the window. User 4 never signed up and is therefore not converted. User 5 never visited the Homepage and is therefore not part of the calculation of the metric. From the formula we then get a CSR of 50% (2/4 = 0.5)

We choose the 1-hour limit on conversion in order to establish a link between the exposure and the subsequent sign-up for those users who see the Community Updates module. For users in the control group, who do not see the module, requiring them to visit the Homepage in combination with this 1-hour limit defines a notion of them "being active on the wiki" when a sign-up might occur. Taken together, we end up with similar restrictions for both the treatment and control groups in the experiment.

Choosing the user as the unit of measurement is done to not have each Homepage visit affect the metric as it would be unreasonable to expect a user to visit the Homepage and be affected by the Community Updates module's call to action every time. With these choices we aim to measure the impact of the Community Updates module over time.

This metric is only defined for logged-in users because an account is required for both visiting the Newcomer Homepage and for signing up to a campaign.

Other approaches

edit

While the definition for the Growth team's experiment uses a specific unit of measurement, the metric is not restricted to that unit. What unit one chooses depends on needs. Here are some observations about different units of measurement:

  • Per-user, where a single user only counts once.
  • Per-session, where every session/visit counts.

Using the user as the unit of measurement is useful when users can sign up to multiple campaigns, or when there is a difference in expected outcome for a finer-level unit. For example, in the case of the Community Updates module experiment, we do not expect users to be equally interested in signing up for a campaign on every visit to the Homepage, particularly if someone is highly active and visits the Homepage a lot. If we did not calculate the metric on a per-user basis, the visits of highly active users would have an undue influence on the result.

Using the session as the unit of measurement is meaningful when the number of sessions per user is low, where the call to action changes with every session, or where a user can sign up to multiple campaigns. In the latter case for example, we want to make sure that a user who signs up to multiple campaigns is counted multiple times.

Rates can also be measured for a single campaign or across multiple campaigns. In the latter case, to measure an average sign-up rate, one can either average the rate across all campaigns, or calculate a weighted average. If the campaigns are roughly equal in size, taking the average across them is a straightforward way to calculate the average. In cases where the size of the campaigns are very different,  defining a unit of measurement for these campaigns (e.g. per-user) and calculating the rate across all units for all campaigns enables larger campaigns to have more influence on the result.

Data aggregation

edit

General case

edit

In the general case, aggregating data and calculating the conversion rate consists of identifying all impression actions, all conversion actions, and calculating the proportion. In SQL this would follow the framework outlined below:

WITH impressions AS (
    SELECT
        pageview_id,
        1 AS was_impressed
    FROM {table}
    -- define the timeframe of data gathering
    WHERE {Condition of timeframe} 
    -- define the impression action
    AND action = '{impression action name}'
), 
conversions AS (
    SELECT
        pageview_id,
        1 AS was_converted
    FROM impressions
    LEFT JOIN {table} AS convert_action
    ON impressions.pageview_id = convert_action.pageview_id
    -- define the timeframe of data gathering
    WHERE {Condition of timeframe}
    -- define a conversion window, if needed
    AND {Condition of return window} 
    -- define the conversion action
    AND action = '{conversion action name}'
)

SELECT 
    SUM(IF(was_converted, 1, 0)) / SUM(was_impressed)
FROM conversions

Community Updates module experiment

edit

The data gathering and calculation of Campaign Sign-up rate for the Community Updates experiment is different from the general case because it requires data from three different sources: impression events from the Newcomer Homepage instrumentation, campaign sign-up data from central storage, and a mapping from local wiki user ids to global user ids from central authentication. We therefore end up with three different queries, and the calculation requires some additional processing in order to merge the data. You can find a prepared example calculation in this Jupyter notebook.

The general structure of the three queries is as follows.

Step 1: Gather visits from the Newcomer Homepage

edit

This query can be run using Spark or Presto depending on your preferences:

SELECT
    wiki,
    event.user_id,
    dt AS visit_dt
FROM event.homepagevisit
-- define the timeframe of data gathering
WHERE {Condition of timeframe}
-- define a set of wikis to gather data from, if needed
AND wiki IN ({comma-separated list of wikis})

Step 2: Convert local user ids to global user ids

edit

This query needs to be run using the "centralauth" MariaDB database. In order to make the query easier to read and work with, it does this conversion for a list of users from a specific wiki, and will therefore need to be run for every wiki.

SELECT
    lu_wiki AS wiki,
    lu_local_id AS user_id,
    lu_global_id AS global_user_id
FROM localuser
-- define the wiki we're converting users for
WHERE lu_wiki = "{wiki database name}"
-- define the list of local user ids we are converting
AND lu_local_id IN ({comma-separated list of local user ids})

Step 3: Look up campaign sign-ups

edit

This query needs to be run using the "wikishared" MariaDB database on the "x1" database server. In order to keep the amount of data queried small, it requires a list of global user ids, which is what we gathered in Step 2.

SELECT
    cep_user_id,
    cep_registered_at
FROM ce_participants
-- define the list of global user ids
-- that we want to identify having signed up for a campaign
WHERE cep_user_id IN ({comma-separated list of global user ids})
-- define a specific campaign if needed
AND cep_event_id = {event id}

Step 4: Calculate the metric

edit

To calculate the metric, Step 4 is to merge the results from Step 1 and Step 2 to get a list of impressions that also contains global user ids. Step 5 is to merge the list from Step 4 with the result from Step 3 in order to get a list of users who signed up. It is at this point that we can enforce the 1-hour time window between visit and signup. We then calculate the metric by counting the number of users from Step 5 and dividing it by the number of users in Step 1.

Planning experiments

edit

Conducting a power analysis – determining a minimum sample size required to detect a certain size of an effect with a certain statistical power – can be done as with overall clickthrough rate (CTR), which is also a proportion metric.

For example, the CSR in the Community Updates module experiment was less than 5%.[2] If we wanted to test an intervention that we think would improve CSR from, say, 4% to 5% (a 1pp increase) and we wanted to be able to detect at least that improvement with 90% statistical power and 5% significance level, we would need at least 7339 impressions on the page where the sign-up can occur. See the R code below for calculation with the pwr package:

library(pwr)

pwr.2p.test(
  h = ES.h(0.05, 0.04),   # Cohen's h as the effect size
  power = 0.9,
  alternative = "greater" # we would test that p1 > p2, not p1 != p2
)

References

edit
  1. Halfaker, Aaron; Keyes, Os; Kluver, Daniel; Thebault-Spieker, Jacob; Nguyen, Tien; Grandprey-Shores, Kate; Uduwage, Anuradha; Warncke-Wang, Morten (2015-05-18). "User Session Identification Based on Strong Regularities in Inter-activity Time". Proceedings of the 24th International Conference on World Wide Web. WWW '15 (Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee): 410–418. ISBN 978-1-4503-3469-3. doi:10.1145/2736277.2741117. 
  2. "Growth/Community Updates". MediaWiki. 2024-02-23. Retrieved 2025-01-10.