Research talk:Anonymous mobile editing in Italian Wikipedia/Work log/2014-11-15
Saturday, November 15, 2014
editToday, I'm just trying to get a sense for how the rate of anonymous mobile editing has changed over time. We should be able to see two phase shifts -- one when the change was deployed on 20141101 and another when caches were cleared on 20141103.
Edits
edithttp://quarry.wmflabs.org/query/948
SELECT LEFT(rev_timestamp, 8) AS date, COUNT(*) FROM revision INNER JOIN change_tag ON ct_rev_id = rev_id AND ct_tag = "mobile edit" WHERE rev_timestamp BETWEEN "20141020" AND "20141114" AND rev_user = 0 GROUP BY date; +----------+----------+ | date | COUNT(*) | +----------+----------+ | 20141020 | 48 | | 20141021 | 54 | | 20141022 | 43 | | 20141023 | 45 | | 20141024 | 70 | | 20141025 | 40 | | 20141026 | 67 | | 20141027 | 46 | | 20141028 | 63 | | 20141029 | 39 | | 20141030 | 97 | | 20141031 | 66 | | 20141101 | 161 | | 20141102 | 235 | | 20141103 | 140 | | 20141104 | 138 | | 20141105 | 151 | | 20141106 | 148 | | 20141107 | 173 | | 20141108 | 150 | | 20141109 | 226 | | 20141110 | 167 | | 20141111 | 151 | | 20141112 | 184 | | 20141113 | 166 | +----------+----------+ 25 rows in set (0.60 sec)
Well, we can clearly see the step, but it looks like there was a substantial amount of anon mobile edits before. How did this happen? Let's go look at some.
http://quarry.wmflabs.org/query/949
SELECT rev_id, rev_user, rev_user_text FROM revision INNER JOIN change_tag ON ct_rev_id = rev_id AND ct_tag = "mobile edit" WHERE rev_timestamp BETWEEN "20141025" AND "20141026" AND rev_user = 0 LIMIT 10 +----------+----------+----------------+ | rev_id | rev_user | rev_user_text | +----------+----------+----------------+ | 68818976 | 0 | 79.37.191.166 | | 68819692 | 0 | 37.119.239.89 | | 68819834 | 0 | 37.119.239.89 | | 68820067 | 0 | 151.47.42.64 | | 68820386 | 0 | 5.90.73.55 | | 68820978 | 0 | 31.156.208.77 | | 68821394 | 0 | 2.36.220.222 | | 68821622 | 0 | 151.18.23.60 | | 68821628 | 0 | 89.204.130.194 | | 68822287 | 0 | 87.18.148.193 | +----------+----------+----------------+ 10 rows in set (4.55 sec)
... OK. So it looks like these are tagged with "mobile edit" and "mobile app edit". So, it looks like I need to filter those app edits out.
SELECT LEFT(rev_timestamp, 8) AS date, COUNT(*) FROM revision INNER JOIN tag_summary ON ts_rev_id = rev_id AND ts_tags LIKE "%mobile edit%" AND ts_tags NOT LIKE "%mobile app edit%" WHERE rev_timestamp BETWEEN "20141028" AND "20141114" AND rev_user = 0 GROUP BY date; +----------+----------+ | date | COUNT(*) | +----------+----------+ | 20141101 | 74 | | 20141102 | 93 | | 20141103 | 92 | | 20141104 | 72 | | 20141105 | 77 | | 20141106 | 62 | | 20141107 | 80 | | 20141108 | 86 | | 20141109 | 114 | | 20141110 | 116 | | 20141111 | 103 | | 20141112 | 110 | | 20141113 | 114 | +----------+----------+ 13 rows in set (0.40 sec)
OK. That looks much better. We don't see any edits at all until day 1. However, if there is a second event, it looks like it might have happened on 11/09, a Sunday.
Next, I wonder how this compares to the overall rate of editing activity.
http://quarry.wmflabs.org/query/951
SELECT date, mobile_edits.count, edits.count, mobile_edits.count/edits.count AS proportion FROM ( SELECT LEFT(rev_timestamp, 8) AS date, COUNT(*) AS count FROM revision INNER JOIN tag_summary ON ts_rev_id = rev_id AND ts_tags LIKE "%mobile edit%" AND ts_tags NOT LIKE "%mobile app edit%" WHERE rev_timestamp BETWEEN "20141028" AND "20141114" AND rev_user = 0 GROUP BY date ) AS mobile_edits INNER JOIN ( SELECT LEFT(rev_timestamp, 8) AS date, COUNTimestamp BETWEEN "20141101" AND "20141114" AND rev_user = 0 GROUP BY date ) AS edits USING (date);(*) AS count FROM revision WHERE rev_timestamp BETWEEN "20141101" AND "20141114" AND rev_user = 0 GROUP BY date ) AS edits USING (date); +----------+-------+-------+------------+ | date | count | count | proportion | +----------+-------+-------+------------+ | 20141101 | 74 | 3553 | 0.0208 | | 20141102 | 93 | 3628 | 0.0256 | | 20141103 | 92 | 3654 | 0.0252 | | 20141104 | 72 | 3609 | 0.0200 | | 20141105 | 77 | 3341 | 0.0230 | | 20141106 | 62 | 3790 | 0.0164 | | 20141107 | 80 | 3640 | 0.0220 | | 20141108 | 86 | 3304 | 0.0260 | | 20141109 | 114 | 3899 | 0.0292 | | 20141110 | 116 | 3930 | 0.0295 | | 20141111 | 103 | 3758 | 0.0274 | | 20141112 | 110 | 3241 | 0.0339 | | 20141113 | 114 | 3254 | 0.0350 | +----------+-------+-------+------------+ 13 rows in set (0.36 sec)
It looks like anonymous mobile edits make up 2-3% of all anonymous activity.
- At least for recent edits, you can just use the tag "mobile web edit".[1] I don't remember the exact day when the mobile tag was "split". --Nemo 09:09, 16 November 2014 (UTC)
Registrations
editSELECT LEFT(timestamp, 8) AS date, COUNT(*) AS registrations FROM ServerSideAccountCreation_5487345 WHERE wiki = "itwiki" AND event_isSelfMade = 1 AND event_displayMobile = 1 AND timestamp BETWEEN "20141028" AND "20141107" GROUP BY date; +----------+---------------+ | date | registrations | +----------+---------------+ | 20141028 | 77 | | 20141029 | 80 | | 20141030 | 73 | | 20141031 | 51 | | 20141101 | 64 | | 20141102 | 80 | | 20141103 | 66 | | 20141104 | 66 | | 20141105 | 63 | | 20141106 | 70 | +----------+---------------+ 10 rows in set (0.03 sec)
There doesn't seem to be any dramatic dip in the raw number of registrations, but it could be dropping. I'll need to do some timeseries analysis to extract the seasonal component.
OK. Time to make a datafile. --EpochFail (talk) 19:09, 15 November 2014 (UTC)
Timeseries analysis
editOK. So I did some simple component extraction (7 days) and plotted the trends.
Nothing surprising here. Regretfully, we have some trendy behavior well before the test started, so it's hard to see what effect anonymous editing is having. However, there's no clear deviation from the trend for mobile registrations -- which makes sense given the strong wording between mobile editors and saving an anonymous edit.
Next time I want to look at RC patroller workload. --EpochFail (talk) 20:30, 15 November 2014 (UTC)
- I wonder if the October drop in edits corresponds to the CentralNotice fundraising campaign which run for mobile and desktop. CentralNotice is known to have a heavy impact on performance, among other things. According to the fundraising CSV data, it run from 2014-09-29 to 2014-11-03 (but it probably had a varying intensity in the period):
medium campaign count usdstd stop_date usdmax sum avg start_date sitenotice C14_itIT_dsk_FR 77050 10.505104 2014-11-01T10:12:45 630.1 608259.93 7.894353 2014-09-29T08:04:40 sitenotice C14_itIT_mob_FR 33199 8.075508 2014-11-03T21:50:44 136.08 204223.4 6.151493 2014-09-29T08:04:46