This page documents a completed research project.

This project builds on earlier quantitative research characterizing messages sent to new users. In those studies (on community participation and deletion notifications), it was found that many good-faith users are immediately thrust into high-level processes such as article deletions, image copyright issues, username concerns, conflict of interest allegations, and edit wars. Many of these processes require a working understanding of not just Wikipedia's encyclopedic standards (such as NPOV or reliable sources), but the social norms and procedures around the administration of the encyclopedia project. Furthermore, participation in such spaces requires a technical understanding, as users must edit specific pages in a rigidly-defined manner in order to properly make themselves visible to Wikipedia's administrative corps.

Questions

edit

Broadly, what are the various ways in which new users encounter and interact with the speedy deletion process?

More specifically:

  • How many new, tagged for speedy deletion articles by new users are kept versus deleted?
  • How long does it take for a newly-created article to be tagged for deletion, for a user to be warned of a speedy deletion tag, and for the article to be deleted?
  • How many new users are receiving speedy deletion notifications (raw and a percentage of those who create new articles)?
  • Are users responding to speedy deletion notices?
  • Are users editing articles and talk pages -- both the of the tagged article and other articles -- after a speedy deletion tag and subsequent deletion?

Methods

edit

This project will continue using the previous dataset of 3,000 randomly-selected new users between 2004 and 2010 (inclusively) who made at least one edit and received at least one edit to their user talk page. I will seek to finish coding each of these 3,000 users based on the schema previously used in studies of first messages and deletion notifications -- specifically whether a user's first message was a template, personalized, welcome, warning, or deletion notification. (Note: This will enable a re-analysis of the previous weeks' questions, as well as create a comprehensive corpus of new users for further projects.)

In addition, this first round of coding new user experiences will involve marking whether or not a user was notified of a speedy deletion on their user talk page. Once this is complete, these cases will be examined to see the different ways in which new users respond, both quantitatively and qualitatively. The following variables were hand-coded:

  • Time of the user's first speedy deletion notification
  • Time when the page was first tagged for speedy deletion (if any)
  • Time when the user created the page
  • Time when the article was deleted (if at all)
    • Whether the article was userfied instead of deleted

Results and discussion

edit

Speedy deletions

edit

The types of speedy deletions, main namespace deletions only, Jan 2001 to June 2011. The following is based on complete data from the English Wikipedia database.

CSD code Rationale # of deleted articles % of all CSDs % of all deleted articles
A7 No indication of importance (individuals, animals, organizations, web content) 497397 37.13% 22.16%
G11 Unambiguous advertising or promotion. 101723 7.59% 4.53%
G1 Patent nonsense 88084 6.58% 3.92%
A1 No context 79139 5.91% 3.53%
G3 Pure vandalism and blatant hoaxes 68454 5.11% 3.05%
G10 Attack pages 62449 4.66% 2.78%
A3 No content. 59889 4.47% 2.67%
G12 Unambiguous copyright infringement. 58254 4.35% 2.59%
R1 Redirects to non-existent pages 57091 4.26% 2.54%
G6 Technical deletions. 54750 4.09% 2.44%
G7 Author requests deletion. 53595 4.00% 2.39%
G8 Pages dependent on a non-existent or deleted page. 45331 3.38% 2.02%
G2 Test pages. 24388 1.82% 1.09%
R3 Implausible typos. 23343 1.74% 1.04%
R2 Redirects, apart from shortcuts, from the main namespace to any other namespace 9152 0.68% 0.41%
G5 Creations by banned or blocked users. 5782 0.43% 0.26%
A9 No indication of importance (musical recordings). 3972 0.30% 0.18%
G4 Recreation of a page that was deleted per a deletion discussion. 2872 0.21% 0.13%
A6 Attack articles 2431 0.18% 0.11%
A2 Foreign language articles that exist on another Wikimedia project. 2182 0.16% 0.10%
A5 Transwikied articles. 2006 0.15% 0.09%
A8 A8. Blatant copyright infringement articles 1321 0.10% 0.06%
A4 Attempts to correspond with the person or group named by its title 109 0.01% 0.00%
G9 Office actions 50 0.00% 0.00%
X CSD -- Not specified 35685 2.66% 1.59%
XX All CSDs 1339449 100.00% 59.66%
XXX All deleted articles 2244952 100.00%

Sample

edit

The sample was not completely coded -- 125 users who received CSD notices out of a total 258 in our sample of 2133 users (from 2004 to 2010). This number is low; however, the sample was coded in random order, and means that the margins of error are large in this preliminary study.

Results

edit

Note: All figures are random samples out of the population of users who received a speedy deletion notice on their talk page within 30 days of their first edit. If a user had multiple CSD notices, only the result of the first article CSDed was examined.

Time (in minutes) between...
created to tagged created to deleted tagged to deleted tagged to warned warned to deleted
median 2 minutes 34.5 minutes 36 minutes 0 minutes 23 minutes
mean 883.83 1057.490 300.31 36.16 232.90
stddev 4744.12 4845.67 1352.12 1278.33 1780.07

Percent of new users who received a speedy deletion notice in their first 30 days: 12.28%

Percent of new users who created an article and received a speedy deletion notice in their first 30 days: [tbd]

Percent of new users who, upon receiving a speedy deletion notice:

  • receive it within 10 minutes of creating their article: 70.08%
  • have it deleted within 10 minutes of creating it: 24.78%
  • have the page deleted: 94.02%
  • are given a CSD warning after their article is deleted: 18.80% (Note: this is probably due to the sheer speed of the CSD activity, as well as speedy deletion of articles by admins which were not previously tagged by other editors -- see the below result)
  • were given a CSD warning, but their article was not tagged: 9.41%

Discussion

edit
Unequal distribution
edit

The results for each of the time intervals are highly skewed, with mean values several orders of magnitude higher than the medians, and standard deviations substantially higher than mean values. This means that there are a large number of small values, which are offset by a small number of extremely large variables. There were several cases in the sample where a user would have created an article and days or weeks would pass before it was tagged for speedy deletion, where it would be deleted under an hour later. There were also cases where a user would tag an article for speedy deletion and there would be a lag of a few days before it would be deleted -- sometimes this was because the case was sent to AfD or another space where it was debated.

Future work

edit
  • Expand to all time periods, separate by time periods
  • Separate by CSD category (are the quick speedy deletions attack pages and copyvios?)
  • Examine what happens to users whose articles are not deleted -- do they act differently?
  • How many edits are made (by the user and/or to the article) between when the page is created, when the page is tagged, when the author is notified, and when the page is deleted.
  • Alternative speedy deletion processes (notified but no tag left, tagged but not notified, deleted without a tag or notification)