Research:New editor

New editor
Specification
A is a newly registered user completing edits to pages in any namespace of a Wikimedia project within days since registration ().
WMF Standard
  • = 1 edit
  • = 1 day
Related metrics
Newly registered user
Status
completed
SQL
SET @t = 1; /* time cutoff in days */
SET @n = 1; /* edits threshold */
SET @start_date = "20140101"; /* January 1st, 2014 after midnight */
SET @end_date = "20140102"; /* February 1st, 2014 before midnight */
 
/* Results in a set of "new editors" */
SELECT
  user_id,
  user_name,
  user_registration
FROM
  (
    /* Get revisions to content pages that are still visible */
    SELECT
      user_id,
      user_name,
      user_registration,
      SUM(rev_id IS NOT NULL) AS revisions
    FROM user
    INNER JOIN logging ON /* Filter users not created manually */
      log_user = user_id AND
      log_type = "newusers" AND
      log_action = "create"
    LEFT JOIN revision ON
        rev_user = user_id AND
        rev_timestamp <= DATE_FORMAT(
            DATE_ADD(user_registration, INTERVAL @t DAY),
            '%Y%m%d%H%i%S')
    WHERE user_registration BETWEEN @start_date AND @end_date
    GROUP BY 1,2,3
 
    UNION ALL
 
    /* Get revisions to content pages that have been archived */
    SELECT
      user_id,
      user_name,
      user_registration,
      SUM(ar_id IS NOT NULL) AS revisions /* Note that ar_rev_id is sometimes set to NULL :( */
    FROM user
    INNER JOIN logging ON /* Filter users not created manually */
      log_user = user_id AND
      log_type = "newusers" AND
      log_action = "create"
    LEFT JOIN archive ON 
      ar_user = user_id AND
      ar_timestamp <= DATE_FORMAT(
          DATE_ADD(user_registration, INTERVAL @t DAY),
          '%Y%m%d%H%i%S')
    WHERE user_registration BETWEEN @start_date AND @end_date
    GROUP BY 1,2,3
  ) AS user_content_revision_count
GROUP BY 1,2,3
HAVING SUM(revisions) >= @n;

New editor is a proposed standardized user class used to measure the number of first-time editors in a wiki project over time. It's used as a proxy for editor activation, and to a lesser extent, editor productivity. A "new editor" is a newly registered user who makes contributions within a given activation period since registration.

Discussion

edit

The majority of new user accounts registered on Wikipedia do not attempt or fail to save an edit. So, when discussing the rate at which new editors are entering Wikipedia, it seems more relevant to measure the subset of new users who end up editing.

The n edits threshold

edit

What amount of activity is necessary? This choice is arbitrary to a large extent. The higher the threshold, the fewer newly registered editors will cross it.

The t time cutoff

edit

Since it is theoretically possible that a newly registered user may take years to make their first edit and observations at any time would truncate such future edits,[1] we artificially censor all observations using some time bound   since the user signed up for a new account. By specifying a   cutoff, we hold all new editors to the same standard, regardless of when they registered and when they make their first contribution.

Newly registered users only

edit

An attached user is not considered a newly registered user and as a result is not counted as a new editor after completing any given number of edits.[2]

Since newly registered users may include accounts created for bot users if they are not registered by proxy, these users are also included in the new editor definition.

Edits across all namespaces

edit

We propose to include in the definition of a new editor edits made to any namespace. When only edits to pages in a project's content namespace(s) are counted, we refer instead to a new content editor. In English Wikipedia, the only content namespace is the "article namespace", also known as namespace 0. Under the proposed "new editor" definition, contributions made to talk or user pages are considered edits as that qualify towards "new editor" status.

Edits to deleted pages

edit

The proposed definition includes activity on pages that are later deleted (including page creation edits) as counting towards "new editor" status. This ensures that we provide a quantitative measurement of activation independent on the productivity or quality of contributions by a newly registered user (which we aim to measure using different metrics). Including activity on deleted pages also ensures that this measurement is not subject to censorship (historical data doesn't change as a function of a future deletion event). See this related discussion on the implications of counting or discounting activity on deleted pages.

Time lag

edit

This metric can be generated   days after user registration. In the case of the WMF standardized parameterization, this is 1 day.

Analysis

edit

There are three variables that need to be chosen in order to apply this metric:

  • The value of  
  • The value of  
  • Whether edits outside content namespaces should be counted

To check how decisions about each of these parameters affect counts of the number of new editors over time, several variations of these parameters were tested on a sample of projects.

English Wikipedia

edit
 
n = 1 (enwiki). The monthly count of newly registered users performing at least 1 edit in 24 hours and 7 days since registration.
 
n = 5 (enwiki). The monthly count of newly registered users performing at least 5 edits in 24 hours and 7 days since registration.
 
n = 10 (enwiki). The monthly count of newly registered users performing at least 10 edits in 24 hours and 7 days since registration.

Portuguese Wikipedia

edit
 
n = 1 (ptwiki). The monthly count of newly registered users performing at least 1 edit in 24 hours and 7 days since registration.
 
n = 5 (ptwiki). The monthly count of newly registered users performing at least 5 edits in 24 hours and 7 days since registration.
 
n = 10 (ptwiki). The monthly count of newly registered users performing at least 10 edits in 24 hours and 7 days since registration.

German Wikipedia

edit
 
n = 1 (dewiki). The monthly count of newly registered users performing at least 1 edit in 24 hours and 7 days since registration.
 
n = 5 (dewiki). The monthly count of newly registered users performing at least 5 edits in 24 hours and 7 days since registration.
 
n = 10 (dewiki). The monthly count of newly registered users performing at least 10 edits in 24 hours and 7 days since registration.

Comparison

edit
 
Content vs. all. The proportion of newly registered users who made at least one edit in 24 hours who also made at least one content namespace edit in 24 hours is plotted.
 
t = day vs. week. The proportion of newly registered users who made at least one edit in 1 week who also made at 1 edit in 24 hours is plotted.
 
n = 1 vs. 10 edits. The proportion of newly registered users who made at least one edit in 1 week who also made at least 10 edits in one week is plotted.

The figures above help us visualize the effects of differences between parameters. When a proportion remains constant over time, that suggests that one metric is proportional to another. That means that both versions of the metric capture the exact same trend information at different scales.

The #Content vs. all and #t = day vs. week are mostly horizontal. This suggests that the type of edits that count and the timescale   that will be considered when generating stats for new editors will not affect overall trends.

However, #n = 1 vs. 10 edits shows strong trend in the proportion of editors who make to it to each threshold over time. This result suggests that different values for the   threshold can change what this metric measures.

Discussion

edit

The number of new editors drops about an order of magnitude for each step: 1, 5, and 10. While   appears to be largely flat after 2008,   and   tell a different story -- one of a steady decline since 2008 for Portuguese and since 2007 for German (see #n = 1 vs. n = 10). The value of   and whether edits outside of content namespaces will be counted seem to be less sensitive (see #t = day vs. week AND #Content vs. all).

Historical definition

edit

Wikistats, the Wikimedia reportcard and the editor trends study define a "New Editor" or "New Wikipedian" as:

A registered and logged-in person (not known as a bot) who has made their 10th edit during the time-period under consideration. Number of edits is a cumulative count across all of time on one wiki.

The canonical restrictions apply to this definition: only edits on countable pages on content namespaces are considered.

Issues

edit
  • Due to the fact that this metric considers a user as a "new editor" when the 10th edit milestone is reached regardless of the user registration time, it doesn't inform us about the behavior of new registered users. The historical definition of a new editor is a hybrid metric, partly driven by new user activation, partly by existing user retention.
  • The canonical definition doesn't distinguish between genuine new users and attached users, i.e., users with an existing record of contributions to their home project and starting for the first time to edit on another project.
  • The definition doesn't include activity on pages that are later deleted as counting towards "new editor" status. See this related discussion on the implications of discounting activity on deleted pages.
  • The definition applies a conventional 10-edit threshold and doesn't allow measuring how many users hit different thresholds that may be equally or more informative.

Comparison with New Wikipedians

edit

The monthly count of New Wikipedians and New editors (ns=0 & t=24 hours) is plotted below for several wikis.

 
250px\px
New editors vs. New wikipedians (enwiki). 
 
250px\px
New editors vs. New wikipedians (ptwiki). 
 
250px\px
New editors vs. New wikipedians (dewiki). 
 
250px\px
New editors vs. New wikipedians (eswiki). 
 
250px\px
New editors vs. New wikipedians (frwiki). 

The factor of difference between New Wikipedians and New editors is plotted below to help visualize deviations. The following function explains how the factor plotted is related to New Wikipedians and New editors.

 

 
375px\px
Factor delta (n=1). The factor of the difference between the monthly count of new wikipedians and new editors (ns=0, n=1, t=24 hours) is plotted.
 
375px\px
Factor delta (n=5). The factor of the difference between the monthly count of new wikipedians and new editors (ns=0, n=5, t=24 hours) is plotted.

Notes

edit
  1. see en:Censoring_(statistics), specifically "right censoring"
  2. Analysis of Wikipedia editor activation should be limited to users registered after 2006 because of inconsistencies in how the logging table recorded new registrations before 2006.