This document was created as part of Signals and Data Services Objective 2 Key Result (KR) 2 from the Wikimedia Foundation's 2024-2025 Annual Plan. The KR focused on developing a set of product and experimentation-focused essential metrics. This document outlines frameworks for defining retention rates for both single cohort and multiple cohorts and provides relevant use cases. It also specifies the instrumentation requirements for accurate measurement. Finally, this document details the analysis methodologies for evaluating retention rates in A/B testing.

Glossary

Unit: A single source of related interactions such as a session, user, or install.
Cohort: A set of units that share a common characteristic or experience within a specific timeframe.

Metric definitions and measurement

We explored the retention rate metrics commonly used in the industry and examined the measurement use cases typically handled by the WMF product team. We then integrated this with our in-house data tracking methods on app users and proposed four definitions of retention rate: retention rate of one cohort, cumulative retention rate, average retention rate of multiple cohorts, and retention rate with two return windows.

Retention rate of one cohort

${\text{Retention Rate}}={\frac {\text{Number of units who returned over a given period}}{\text{Number of units in a cohort}}}\times 100\%$

In general, the retention rate measures the percentage of users who continue to use an app after their initial installation or interaction over a specific period. There are three major steps to calculate it: 1) measure the number of units in the cohort, 2) measure the number of returning units in the cohort over a given period, and 3) calculate the proportion. In the first two steps, specific details need to be defined based on your analysis needs and project requirements.

Step1: Measure the cohort size

When measuring the cohort size, there are two factors to define based on your application: the initial action and the cohort window.

Initial action: this could be an event such as app installation or the first interaction with the app.

Cohort window: this refers to the time period during which you want to sample the units that performed the initial action and group them into one cohort. The cohort window could be as short as a day or a few days, or it could be longer, like 1 week, 2 weeks, 1 month, etc. The length of the window depends on the unit’s behavior patterns and the purpose of the measurement.

Cohort Window Case Example

For shorter-term measurements, such as daily monitoring, the most commonly used cohort window is one day. For medium-term measurement, such as A/B testing (which typically runs 2-4 weeks in WMF), the ideal cohort timing window could be a few days or a couple of weeks to ensure sufficient data for return measurement, which is the numerator in the equation. For longer-term measurements, such as high-level metrics, the cohort timing window can be one month or even several months. Since some unit identifiers may expire or be hashed after a certain period, they should also be examined when selecting the window.

To measure the cohort size, count the total number of unique units that performed the initial action during the cohort window. Here are a few examples.

If app installation is the initial action and 1 day is the cohort window, the cohort is counted as the number of units that installed the app on a specific day.
If app opening is the initial action and 1 week is the cohort window, the cohort is counted as the number of unique units that opened the app during a specific week.
If app engagement is the initial action and 1 month is the cohort window, the cohort is counted as the number of unique units that engaged in the app in the given month.

In the example shown above, User 1,2,3,4 are included in the cohort, while User 5 is not included because they did not take initial action within the cohort window.

Step 2: Measure the returned units in cohort over a given time period

When measuring the returned units, there are five factors to to define based on your application: unit, return action and its frequency, retention length and return window

Units: these units should belong to the cohort you selected in step1.

Return action: This is an interaction with the app that indicates users have reused it. Typically, this action is the same as the initial action.

Frequency of the return action: this refers to the number of return actions the unit must take to be considered retained.

Retention length: This refers to the duration over which you want to assess retention. Depending on varying measurement interests, retention length can range from hours to days, weeks, or even months.

Return window: This refers to the time frame during which the user’s return actions are measured. For the retention rate of one cohort, the return window is defined with the same length for all units. While the length of the return window is consistent, the start date may vary based on when the user’s initial action occurred, and the end day may differ due to these varying start dates. With the return window defined in this way, each unit has the same amount of time to complete their return action. This method is often used to measure N-th day/week retention rates.

Here are a few examples.

If the retention length is 1 day and the return window lasts for 1 day, the return window starts 1 day after the unit’s initial action, covering the entire 2nd day. In the example shown below, User 2 and User 3 are considered retained because they took actions within their return windows.

Return Window Case Example 1

If the retention length is 7 days and the return window lasts for 1 day, the return window starts 7 days after the unit’s initial action, covering the entire 8th day. In the example shown below, User 1 is considered retained because they took actions within their return windows.

Return Window Case Example 2
If the retention length is 7 days and the return window lasts for 3 days, the return window starts 7 days after the unit’s initial action, covering the period between 8th day and 10th day. In the example shown below, User 1, User 2 are considered retained because they took actions within their return windows.

Return Window Case Example 3
If the retention length is 2 weeks and the return window lasts for 1 week, the return window starts two weeks after the unit’s initial action, covering the entire 3rd week.

We suggest using this definition for A/B testing, as it offers a more accurate and equitable return status for all units. But it increases query complexity and demands more data over a longer period.

For example, during a 3-week A/B test period, from August 1 to August 21. If we choose a cohort that made the initial action between August 1 and August 7, the retention length is 7 days, and the length of return windows is also 7 days, then return window varies as follows:

For a unit that took the initial action on August 1 at 00:00:01, (i.e., as soon as the experiment goes live at 00:00:00), the return window is from July 8, at 00:00:01 (inclusive) to July 15, at 00:00:01 (exclusive).
For a unit that took the initial action on August 7 at 23:59:59, (i.e., the last possible second to be in the cohort), the return window is from July 14, at 23:59:59 (inclusive) to July 21, at 23:59:59 (exclusive).

This means that for a 1-week retention rate, you need to collect data for 2 weeks, and for a 2-week retention rate, you need to collect data for 3 weeks. Please keep this in mind when planning your A/B testing.

With the above defined elements, you can measure the returned units by counting the total number of unique units that performed the return action during their own return windows.

Step 3: Calculate the proportion

Using the results from the previous two steps, you can calculate the retention rate by dividing the number of returned units by the number of units in cohort at the start. For example, the retention rate might be the percentage of units that opened the app on one day and then used it again the next day, known as the second-day retention rate. Another example would be the percentage of units that opened or engaged with the app on August 1st and then reopened or engaged with it in the second week– this is known as the second-week retention rate. In summary, if you are interested in the Nth-week retention rate, it is defined as the percentage of units that opened or engaged with the app in week 1 and then reopened or engaged with it in the Nth week.

Data aggregation

This is the query framework. Analyst needs to define the query conditions for 1) cohort window, 2) initial action, 3) return action, 4) return window, 5) return frequency.

Query framework

-- Query framework
WITH t_cohort AS (
	SELECT
         cohort,
         app_install_id,
         first_action_dt
      FROM {table} AS init_action
      -- define the cohort window
      WHERE {Condition of cohort window} 
      -- define the initial action if needed
            AND action='{initial action name}'
      GROUP BY
         app_install_id,
         cohort
), 
t_return AS (
 SELECT cohort, app_install_id, count(return_actions) AS returns
  FROM t_cohort
  LEFT JOIN {table} AS return_action
  ON t_cohort.app_install_id = return_action.app_install_id
        -- define the return window
        AND {Condition of return window} 
        -- define the return action if needed
	 AND action='{return action name}'
  GROUP BY cohort, t_cohort.app_install_id
)


SELECT 
   cohort, 
    -- define the return frequency, i.e. number of return actions
    --  returns > {threshold}
   SUM(IF(returns > {threshold}, 1,0))/ COUNT(app_install_id)
FROM t_return
GROUP BY cohort

Query example

Here is a use case of the second-day retention rate for app users. We measured the retention rate of the cohort ‘2024-07-01’, which represents the percentage of users who took the initial action on 2024-07-01 and returned on the second day, 2024-07-02.

WITH t_cohort AS (
      SELECT
         DATE_FORMAT(init_action.meta.dt, 'yyyy-MM-dd') AS cohort,
         init_action.app_install_id AS app_install_id,
         TO_TIMESTAMP(MIN(init_action.meta.dt)) AS first_action_dt
      FROM event.android_breadcrumbs_event AS init_action
      -- define the cohort window, here is one day
      WHERE year = 2024 AND month = 7 AND day =1
      -- define the initial action if needed
      --     AND action='{initial action name}'
      GROUP BY
         init_action.app_install_id,
         DATE_FORMAT(init_action.meta.dt, 'yyyy-MM-dd')
 ),
t_return AS (
    SELECT
        t_cohort.cohort, 
        t_cohort.app_install_id AS app_install_id,
        COUNT(return_action.dt) AS returns
    FROM t_cohort
    LEFT JOIN event.android_breadcrumbs_event AS return_action
        ON t_cohort.app_install_id = return_action.app_install_id
        -- define the return window
        AND TO_TIMESTAMP(return_action.meta.dt) >=
                t_cohort.first_action_dt + INTERVAL '1' DAY
        AND TO_TIMESTAMP(return_action.meta.dt) < 
                t_cohort.first_action_dt + INTERVAL '2' DAY
        -- define the return action if needed
	 -- AND action='{return action name}'
    GROUP BY cohort, t_cohort.app_install_id 
)

SELECT 
    cohort, 
    SUM(IF(returns>0, 1,0)) AS returned_users, COUNT(app_install_id) AS cohort_users,
    ROUND(SUM(IF(returns>0, 1,0)) / COUNT(app_install_id) , 4) AS retention_rate
FROM t_return
GROUP BY cohort

Query example

Here is a use case of second-week retention rate for app users. We measured the retention rate of the cohort ‘2024-07-01 ~ 2024-07-07’, which represents the percentage of users who took the initial action during the week of 2024-07-01 and returned in the second week.

WITH t_cohort AS (
    SELECT
            init_action.app_install_id AS app_install_id,
            TO_TIMESTAMP(MIN(init_action.meta.dt)) AS first_action_dt
    FROM event.android_breadcrumbs_event AS init_action
    WHERE
            year = 2024 AND month = 7 AND day BETWEEN 1 AND 7
    GROUP BY
            init_action.app_install_id
),
t_return AS (
    SELECT
        t_cohort.app_install_id AS app_install_id,
        COUNT(return_action.dt) AS returns
    FROM t_cohort
    LEFT JOIN event.android_breadcrumbs_event AS return_action
        ON t_cohort.app_install_id = return_action.app_install_id
        AND TO_TIMESTAMP(return_action.meta.dt) >=
                t_cohort.first_action_dt + INTERVAL '7' DAY
        AND TO_TIMESTAMP(return_action.meta.dt) < 
                t_cohort.first_action_dt + INTERVAL '14' DAY
    GROUP BY t_cohort.app_install_id 
)


SELECT 
  '2024-07-01' AS cohort_start_date, 
  '2024-07-07' AS cohort_end_date,
  ROUND(SUM(IF(returns>0, 1,0))/COUNT(app_install_id), 4) AS retention_rate
FROM t_return

Cumulative retention rate of one cohort

${\text{Retention Rate}}={\frac {\text{Number of units who returned by a specific date}}{\text{Number of units in a cohort}}}\times 100\%$

In real projects, analysts often face limited data points, short experiment durations, and a small number of returned units. To include as many returning units as possible, analysts might loosen the time criteria and set a return window as long as possible, covering all potential time periods and ending at the same time for all units, usually the last day of the available data. It’s often referred to as the cumulative retention rate.

The definition of cumulative retention rate follows the same framework mentioned in ‘ Retention rate of one cohort’, except for how the return window is defined. The return window is set to end at the same time for all units. In this scenario, the start date may vary based on when the user’s initial action occurred, but the end date is the same for everyone. This implies that units that initiate their actions later have less time to complete their return action.

Here is an example of a 10-day cumulative retention rate. The return window starts on the 2nd day of the unit’s initial action and ends on day 10. As long as the unit completes the return action by the end of day 10, it is counted as retained.

Cumulative Retention Rate Case Example

Data aggregation

The query framework is the same as the one used for the retention rate of one cohort. The analyst needs to define the query conditions for the following: 1) cohort window, 2) initial action, 3) return action, 4) return window, and 5) return frequency. The major difference lies in how the return window is defined: the return window ends on the same day, rather than shifting over a period of the same length. For more details, refer to the example below.

Query example

Here is a use case of 14-day cumulative retention rate for app users. We measured the retention rate of the cohort ‘2024-07-01 ~ 2024-07-07’, which represents the percentage of users who took the initial action during the week of 2024-07-01 and returned at least twice ( return frequency definition ) by the end of 2024-07-14 (return window definition).

WITH t_cohort AS (
  SELECT
           init_action.app_install_id AS app_install_id,
            TO_TIMESTAMP(MIN(init_action.meta.dt)) AS first_action_dt
        FROM event.android_breadcrumbs_event AS init_action
        WHERE
            year = 2024 AND month = 7 AND day BETWEEN 1 AND 7
        GROUP BY
            init_action.app_install_id
), 
t_return AS (
SELECT
    t_cohort.app_install_id AS app_install_id,
    COUNT(return_action.dt) AS returns
FROM t_cohort
LEFT JOIN event.android_breadcrumbs_event AS return_action
    ON t_cohort.app_install_id = return_action.app_install_id
    AND TO_TIMESTAMP(return_action.meta.dt) >=
            t_cohort.first_action_dt + INTERVAL '1' DAY
     -- Define the end date of the return window. It ends on the same day for all units.
    AND TO_TIMESTAMP(return_action.meta.dt) < '2024-07-15'
GROUP BY t_cohort.app_install_id 
) 


SELECT '2024-07-01' AS cohort_start_date, '2024-07-07' AS cohort_end_date,
 SUM(IF(returns>1, 1,0)) AS returned_users, COUNT(app_install_id) AS cohort_users
FROM t_return

Average retention rate of multiple cohorts

In some cases, you might want to measure the retention rate across multiple cohorts, for purposes like daily monitoring and baseline collection. To do this, you can calculate the average retention rate for the multiple cohorts, which may be categorized based on different initial action windows or other criteria, such as by wikis. It is important to use the same definition for calculating the retention rate across all cohorts before averaging them. For example, If you are interested in the average Nth-week retention rate, make sure to use the same retention length (N weeks) for all cohorts before averaging the results.

Retention rate of one cohort: ${\begin{aligned}r_{i}&={\frac {{\text{ Number of units who returned over a given period in chort}}~i}{{\text{Number of units in cohort}}~i}},~i=1,\ldots ,n\\\\{\text{Avg Retention Rate}}&={\frac {\sum _{i=1}^{n}r_{i}}{n}}\times 100\%\end{aligned}}$

Data aggregation

This is the query framework. Analyst needs to define the query conditions for 1) cohort window, 2) initial action, 3) return action, 4) return window, 5) return frequency.

Query framework

-- Query framework
WITH t_cohort AS (
	SELECT
         cohort,
         app_install_id,
         first_action_dt
      FROM {table} AS init_action
      -- define the cohort window
      WHERE {Condition of cohort window} 
      -- define the initial action if needed
            AND action='{initial action name}'
      GROUP BY
         app_install_id,
         cohort
),    
t_per_cohort_retention_rate AS (           
    SELECT cohort, 
 -- define the return frequency, i.e. number of return actions
 --  returns > {threshold}
    SUM(IF(returns>{threshold}, 1,0))/COUNT(app_install_id) AS retention_rate
    FROM 
    (
    SELECT
        t_cohort.cohort, 
        t_cohort.app_install_id AS app_install_id,
        COUNT(return_action.dt) AS returns
    FROM t_cohort
    LEFT JOIN {table} AS return_action
        ON t_cohort.app_install_id = return_action.app_install_id
        -- define the return window
        AND {Condition of return window} 
        -- define the return action if needed
	 AND action='{return action name}'
    GROUP BY cohort, t_cohort.app_install_id 
    ) AS t_return
    GROUP BY cohort
    ORDER BY cohort
)

SELECT avg(retention_rate)
FROM t_per_cohort_retention_rate

Query example

Here is a use case of the average 7th-day retention rate of app users. We measure the 7th-day retention rate of 7 cohorts: ‘2024-07-01’,‘2024-07-02’,... ‘2024-07-07’, and then calculate the average of these rates.

WITH t_cohort AS (
    SELECT
        DATE_FORMAT(init_action.meta.dt, 'yyyy-MM-dd') AS cohort,
        init_action.app_install_id AS app_install_id,
        TO_TIMESTAMP(MIN(init_action.meta.dt)) AS first_action_dt
    FROM event.android_breadcrumbs_event AS init_action
    WHERE
            -- define the cohort window
            year = 2024 AND month = 7 AND day BETWEEN 1 AND 7
    GROUP BY
            init_action.app_install_id,
            DATE_FORMAT(init_action.meta.dt, 'yyyy-MM-dd')
), 

t_per_cohort_retention_rate AS (           
    SELECT cohort, 
     SUM(IF(returns>0, 1,0)) AS returned_users, 
COUNT(app_install_id) AS cohort_users,
     ROUND(SUM(IF(returns>0, 1,0))/COUNT(app_install_id), 4) AS retention_rate
    FROM 
    (
    SELECT
        t_cohort.cohort, 
        t_cohort.app_install_id AS app_install_id,
        COUNT(return_action.dt) AS returns
    FROM t_cohort
    LEFT JOIN event.android_breadcrumbs_event AS return_action
        ON t_cohort.app_install_id = return_action.app_install_id
        -- define the return window
        AND TO_TIMESTAMP(return_action.meta.dt) >=
                t_cohort.first_action_dt + INTERVAL '7' DAY
        AND TO_TIMESTAMP(return_action.meta.dt) < 
                t_cohort.first_action_dt + INTERVAL '8' DAY
    GROUP BY cohort, t_cohort.app_install_id 
    ) AS t_return
    GROUP BY cohort
    ORDER BY cohort
    LIMIT 100000
)

SELECT avg(retention_rate)
FROM t_per_cohort_retention_rate

Retention rate with two return windows

${\text{Retention Rate}}={\frac {\text{Number of units who returned in two timing windows}}{\text{Number of units in a cohort}}}\times 100\%$

In the interest of user stickiness, analysts may want to measure retention with two return windows. Specifically, you might want to know the percentage of units that returned during window 1 and then also returned during window 2. To do this, most steps are the same as calculating the retention rate for one cohort. However, when calculating the number of returned units, you need to add the condition that users must complete return actions in both windows.

Here is an example of a 7-day retention rate with 2 return windows. To be counted as retained, the unit needs to complete a return action on the second day and another between the 8th and 10th days.

Two Return Windows Case Example

Data aggregation

The query framework is the same as the one used for the retention rate of one cohort. The analyst needs to define the query conditions for the following: 1) cohort window, 2) initial action, 3) return action, 4) return window, and 5) return frequency. The major difference lies in the number of return windows defined. You define two return windows instead of one. For more details, refer to the example below.

Query example

Here is a use case for the retention rate with two return windows for app users. We measured the retention rate of the cohort ‘2024-07-01 ~ 2024-07-07’, which represents the percentage of users who took the initial action during the week of 2024-07-01 and returned on both the 2nd day and sometime between the 8th and 10th days.

WITH t_cohort AS (
    SELECT
            init_action.app_install_id AS app_install_id,
            TO_TIMESTAMP(MIN(init_action.meta.dt)) AS first_action_dt
    FROM event.android_breadcrumbs_event AS init_action
    WHERE
            year = 2024 AND month = 7 AND day BETWEEN 1 AND 7
    GROUP BY
            init_action.app_install_id
),  
t_return AS
(
    SELECT
        t_cohort.app_install_id AS app_install_id,
        COUNT(DISTINCT return_action1.dt) AS win1_returns, 
        COUNT(DISTINCT return_action2.dt) AS win2_returns
    FROM t_cohort
    -- Look for return actions in window 1
    LEFT JOIN event.android_breadcrumbs_event AS return_action1
        ON t_cohort.app_install_id = return_action1.app_install_id
        AND TO_TIMESTAMP(return_action1.meta.dt) >=
                t_cohort.first_action_dt + INTERVAL '1' DAY
        AND TO_TIMESTAMP(return_action1.meta.dt) < 
                t_cohort.first_action_dt + INTERVAL '2' DAY
 AND return_action1.year=2024 AND return_action1.month=7
    -- Look for return actions in window 2
    LEFT JOIN event.android_breadcrumbs_event AS return_action2
        ON t_cohort.app_install_id = return_action2.app_install_id
        AND TO_TIMESTAMP(return_action2.meta.dt) >=
                t_cohort.first_action_dt + INTERVAL '8' DAY
        AND TO_TIMESTAMP(return_action2.meta.dt) < 
                t_cohort.first_action_dt + INTERVAL '11' DAY
 AND return_action2.year=2024 AND return_action2.month=7
    GROUP BY t_cohort.app_install_id 
) 

SELECT
    '2024-07-01' AS cohort_start_date,
    '2024-07-07' AS cohort_end_date,
    SUM(IF((win1_returns > 0) AND (win2_returns > 0), 1, 0)) / COUNT(app_install_id) AS retention_rate
FROM t_return

Data Collection (Instrumentation)

To measure retention rate, instrumentation must log unit identifiers, like user ID, app installation ID, session ID, and so on. As some types of the IDs may have a limited lifespan for privacy or other reasons and could be hashed or regenerated for individual users after a certain period, this needs to be carefully planned along with defining the cohort window and return window.

Instrumentation must also log the actions taken and their timestamps.

For the purpose of A/B testing, a column indicating the experiment assignment is also needed.

Statistical analysis

The recommendations of methods for calculating confidence intervals for baselines and analyzing A/B tests are documented at https://people.wikimedia.org/~jiawang/notes/05_demo_AB_test_analysis.html

Repo:https://gitlab.wikimedia.org/jiawang/sds2_2024_retention_rate/

Research and Decision Science/Data glossary/Retention Rate

Contents

Glossary

Metric definitions and measurement

Retention rate of one cohort

Data aggregation

Cumulative retention rate of one cohort

Data aggregation

Average retention rate of multiple cohorts

Data aggregation

Retention rate with two return windows

Data aggregation

Data Collection (Instrumentation)

Statistical analysis