Days since first edit
If you randomly sample revisions from the English Wikipedia, and find out how long each editor has been with the project, what is the trend? Here is the average of 222773 such samples.
In detail: samples were chosen by taking one edit per 10 minute period, in particular the earliest edit after the start of the period. Duplicate samples chosen in this manner were removed. For each edit, the user_registration field is consulted. This is the date of first edit accounts created before December 25, 2005, and the actual date of registration for edits after it. A monthly average was then calculated.
Here is the source data for the graph:
Period Days Sample size Jan-02 96.66948564 3117 Feb-02 109.1223501 2973 Mar-02 119.9344065 3797 Apr-02 143.5780189 3569 May-02 162.827694 3182 Jun-02 155.5267995 3532 Jul-02 193.36359 3589 Aug-02 211.2267589 4380 Sep-02 208.2957376 4256 Oct-02 181.4799633 4410 Nov-02 203.0105795 4222 Dec-02 190.9854214 4355 Jan-03 226.6011664 4405 Feb-03 229.0190045 3867 Mar-03 232.1456002 4367 Apr-03 228.5744291 4160 May-03 246.7534909 4390 Jun-03 284.5185389 4311 Jul-03 272.2861387 4388 Aug-03 270.176214 4450 Sep-03 286.1736725 4221 Oct-03 274.4471518 4207 Nov-03 274.7829859 4311 Dec-03 257.9103377 3855 Jan-04 248.3024366 4410 Feb-04 265.1058562 4156 Mar-04 254.8133977 4450 Apr-04 248.8855159 4299 May-04 255.5882108 4382 Jun-04 264.8154059 4052 Jul-04 241.102758 4460 Aug-04 248.9399052 4460 Sep-04 277.7529204 4319 Oct-04 278.9730357 4444 Nov-04 277.671123 4320 Dec-04 311.1636362 4434 Jan-05 294.0872911 4444 Feb-05 314.9714574 3878 Mar-05 300.7735358 4369 Apr-05 294.9082006 4318 May-05 311.2628278 4459 Jun-05 294.7128944 4118 Jul-05 298.4991302 4462 Aug-05 290.3381185 4460 Sep-05 299.6622528 4301 Oct-05 311.4702595 4464 Nov-05 301.4848687 4320 Dec-05 310.3590844 4464 Jan-06 321.7717266 4464 Feb-06 301.4229457 4002 Mar-06 283.9253029 4455 Apr-06 301.9083951 4255
Model
editIn English Wikipedia the number of contributors is still growing exponentially (in contrast to the German Wikipedia, see [1]). So you can model the number of contributors at time t with:
If you assume that each contributor makes one edit per timestep then the sum of the age of all edits is the integral of users:
So the average number of days since the first edit of the contributor of an average edit was:
But the number of edits per user is distributed very unevenly, so you get a more logarithmic growth (but how?)
Histogram
editThe number of days is calculated in bins of 10, so 100 days is 95-105. The distribution can be modeled ( ) with
with and . Vice versa you can calculate the frequency of edits with a given age of its contributor (by number of days since its first edit) with
with .
Around 25% of all edits are done by users that had their first edit less than 100 days ago.
This statistic does not hold any information about the quality of edits.