Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-05-28
Thursday, May 28, 2015
editToday, I hope to gather some basic stats on block rates and do a basic analysis of time to complete edits and edit completion rates.
Block rates and types
editSince it's morning and I'm still working on my coffee, let's do the easy one. Block rates.
> select user_id, user_name, log_comment, log_params from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" limit 3; +----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+ | user_id | user_name | log_comment | log_params | +----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+ | 25251960 | FastechMedia1 | {{uw-spamublock}} | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} | | 25252283 | Dumdddum | [[WP:Vandalism-only account|Vandalism-only account]] | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} | | 25252071 | Incijanvijnijvcbi2juyhe2y8c28ybhij byv8ibbvy81bhwvye2f8cjbqjne2cbiuejnb2kb | {{uw-vaublock}} <!-- Username violation, vandalism-only account --> | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} | +----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+ 3 rows in set (0.00 sec)
It looks like we should be able to extract information about whether the block was for vandalism or not.
> select SUM(log_comment RLIKE "(S|s)pam"), SUM(log_comment RLIKE "(V|v)and"), COUNT(*) from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" limit 3; +-----------------------------------+-----------------------------------+----------+ | SUM(log_comment RLIKE "(S|s)pam") | SUM(log_comment RLIKE "(V|v)and") | COUNT(*) | +-----------------------------------+-----------------------------------+----------+ | 43 | 30 | 124 | +-----------------------------------+-----------------------------------+----------+ 1 row in set (1.11 sec)
Hmm If not spam or vandalism, what else?
> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "(V|v)and" limit 10,10; +------------------------------------------+----------------------------------------------+ | user_name | log_comment | +------------------------------------------+----------------------------------------------+ | HURDEHGURDEHHURDEHGURDEH | {{checkuserblock-account}} | | DUNNADUNNADUNNADUNNADUNNADUNNADUNNADUNNA | {{checkuserblock-account}} | | Hgtresgtedgtredgtre | {{checkuserblock-account}} | | Clioriosa | [[WP:Disruptive editing|Disruptive editing]] | | Kittywankfaster | {{uw-vaublock}} | | Swangshit | {{uw-vaublock}} | | Donkeypunch101 | {{uw-vaublock}} | | Shittycumsquats | {{uw-vaublock}}; see the edit filter log | | Butt head 458 | {{uw-vaublock}} | | Huffgrade | [[WP:Disruptive editing|Disruptive editing]] | +------------------------------------------+----------------------------------------------+ 10 rows in set (0.56 sec)
Looks like en:Template:Uw-vaublock is about vandalism and so is Disruptive editing. Let's update our regex.
> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing))" AND log_comment NOT RLIKE "((S|s)ock|(U|u)ser)"; +------------------------+-----------------------------------------------------------------------------------------+ | user_name | log_comment | +------------------------+-----------------------------------------------------------------------------------------+ | JamaicanStonSale | {{uw-softerblock}} | | $wagggg666420hailsatan | {{uw-ublock}} | | Hamilton4195 | Deliberately triggering the [[WP:Edit filter|Edit filter]] | | Morlvi471 | [[WP:Blocking policy#Evasion of blocks|Block evasion]] | | 420blaze itdopeswag | Clearly [[WP:NOTHERE|not here to contribute to the encyclopedia]] | | Gullswealsslslsl | [[WP:No personal attacks|Personal attacks]] or [[WP:Harassment|harassment]] | | MarkCohen65 | Creating [[WP:Attack page|attack pages]] | | Daytonowen | [[WP:Blocking policy#Evasion of blocks|Block evasion]] | | Anandasyarif | Deliberately triggering the [[WP:Edit filter|Edit filter]]: also clearly [[WP:NOTHERE]] | +------------------------+-----------------------------------------------------------------------------------------+ 9 rows in set (0.04 sec)
> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)" AND log_comment NOT RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)" AND log_comment NOT RLIKE "softerblock|soft block" AND log_comment NOT RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock"; Empty set (0.10 sec)
OK. Time to encode this into a field.
> SELECT -> LEFT(user_name, 25), -> LEFT(log_comment, 25), -> IF(log_comment RLIKE "(S|s)pam", "spam", -> IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism", -> IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock", -> IF(log_comment RLIKE "softerblock|soft block", "soft username", -> IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username", -> "other" -> ))))) AS type -> FROM enwiki.logging -> INNER JOIN enwiki.user ON -> REPLACE(log_title, "_", " ") = user_name -> INNER JOIN staging.ve2_pilot_users USING (user_id) -> WHERE -> log_type = "block" AND -> log_action = "block" AND -> log_timestamp BETWEEN "2015052115" and "2015052215" -> ORDER BY RAND() -> LIMIT 25; +---------------------------+---------------------------+---------------+ | LEFT(user_name, 25) | LEFT(log_comment, 25) | type | +---------------------------+---------------------------+---------------+ | MarkCohen65 | Creating [[WP:Attack page | vandalism | | TurtleLimited | {{uw-spamublock}} <!-- Pr | spam | | Morlvi471 | [[WP:Blocking policy#Evas | sock | | Agarwalpackers moverss | [[WP:Spam|Spam]] / [[WP:N | spam | | Theherbaliststop | {{uw-spamublock}} <!-- Pr | spam | | Crave Wrestling | {{uw-spamublock}} <!-- Pr | spam | | Art Space Gallery London | {{uw-softerblock}} <!-- P | soft username | | Douglasleben | spammer/spambot | spam | | Chickenfucker1 | {{uw-uhblock}} <!-- Usern | hard username | | Johnny D in the Land of C | {{blockedsock|John Daker} | sock | | CatalystConsulting | {{uw-softerblock}} <!-- P | soft username | | Zeshanshabbirofficial | {{uw-spamublock}} <!-- Pr | spam | | Julanjg | [[WP:Vandalism-only accou | vandalism | | DUNNADUNNADUNNADUNNADUNNA | {{checkuserblock-account} | sock | | JamaicanStonSale | {{uw-softerblock}} | soft username | | CJ Software Logistics | {{uw-softerblock}} <!-- P | soft username | | Jayron32 and Nyttend are | user... | hard username | | Himynamename | Abusing [[WP:Sock puppetr | vandalism | | The Original John Daker | {{blockedsock|John Daker} | sock | | TrollntemptationHD | [[WP:Vandalism-only accou | vandalism | | Roy yuan | Using Wikipedia for [[WP: | spam | | Hamilton4195 | Deliberately triggering t | vandalism | | Cjohnson753 | [[WP:Vandalism-only accou | vandalism | | Alsancakuniforma | {{uw-spamublock}} <!-- Pr | spam | | Agarwalpackersandmoversdr | {{uw-softerblock}} <!-- P | soft username | +---------------------------+---------------------------+---------------+ 25 rows in set (1.38 sec)
OK. It looks like that is working. Now to formalize it and extract the data.
SELECT
user_id,
IF(log_comment RLIKE "(S|s)pam", "spam",
IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism",
IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock",
IF(log_comment RLIKE "softerblock|soft block", "soft username",
IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username",
"other"
))))) AS type
FROM enwiki.logging
INNER JOIN enwiki.user ON
REPLACE(log_title, "_", " ") = user_name
INNER JOIN staging.ve2_pilot_users USING (user_id)
WHERE
log_type = "block" AND
log_action = "block" AND
log_timestamp BETWEEN "2015052115" and "2015052215";
head -n3 pilot_user_blocks.tsv;wc pilot_user_blocks.tsv user_id type 25251960 spam 25252283 vandalism 125 272 2182 pilot_user_blocks.tsv
Cool. Looks like that is ready to go. Now to do the hard work of looking at the logging data. --Halfak (WMF) (talk) 09:07, 28 May 2015 (UTC)
Edit session stats
editOK. Now I'm digging into stats about edit completion rates and timing.
> SELECT -> user_id, -> event_editingSessionId AS session_id, -> event_editor AS editor, -> MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started, -> MAX(timestamp) AS session_ended, -> MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready, -> MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt, -> IF(SUM(event_action = 'abort') > 0, -> IF(SUM(`event_action.abort.type` = 'nochange') > 0, 'abort_nochange', 'abort'), -> IF(SUM(event_action = 'saveSuccess') > 0, 'success', 'other') -> ) AS outcome -> FROM staging.ve2_pilot_users -> INNER JOIN log.Edit_11448630 ON -> wiki = 'enwiki' AND -> user_id = `event_user.id` -> WHERE timestamp BETWEEN "2015052115" and "2015052215" -> GROUP BY user_id, session_id -> LIMIT 10; +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ | user_id | session_id | editor | session_started | session_ended | editor_ready | first_attempt | outcome | +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ | <snip> | 781557474d435416 | visualeditor | 20150521150126 | 20150521150601 | 20150521150126 | 20150521150500 | abort_nochange | | <snip> | 2c7ca865d011c19a6083c3363723f009 | wikitext | 20150521181116 | 20150521181130 | 20150521181130 | NULL | other | | <snip> | 027f8c94e370bf558a85873055bfcd12 | wikitext | 20150521184516 | 20150521184539 | 20150521184517 | NULL | abort_nochange | | <snip> | 0f2960a6aca5d9196d4627dcb35431b8 | wikitext | 20150521171238 | 20150521171304 | 20150521171239 | 20150521171304 | success | | <snip> | 0f36411d0d26712cbe43f2dc226b3a8a | wikitext | 20150521154401 | 20150521155150 | 20150521154402 | 20150521155150 | success | | <snip> | 0ffe919e0b3d1d54638dd80085ba9987 | wikitext | 20150521205239 | 20150521205248 | 20150521205241 | NULL | abort_nochange | | <snip> | 1a3e0b75a38bd8b5bce36acdb87da48d | wikitext | 20150521182659 | 20150521182934 | 20150521182701 | 20150521182858 | abort | | <snip> | 1bca0ecc5a3e9ac33aedc2ce07bb39ec | wikitext | 20150521161304 | 20150521161316 | 20150521161305 | NULL | abort_nochange | | <snip> | 1bcb04f925a1d54bef5d5ca3abf16a37 | wikitext | 20150521184624 | 20150521184641 | 20150521184625 | 20150521184641 | success | | <snip> | 2243957369cc389fec3cebc50c3fc7d6 | wikitext | 20150521152112 | 20150521152123 | 20150521152114 | NULL | abort_nochange | +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ 10 rows in set (2.14 sec)
That looks pretty good, but I still see some "other" outcomes. Gotta remind myself what event_actions are possible to know if the "other" bucket is fine.
> select distinct event_action from Edit_11448630 WHERE wiki = 'enwiki' AND timestamp BETWEEN "2015052115" and "2015052215"; +--------------+ | event_action | +--------------+ | init | | saveFailure | | saveAttempt | | saveSuccess | | ready | | abort | | saveIntent | +--------------+ 7 rows in set (12.58 sec)
Well, we should never fail on 'saveIntent', but we could on 'saveFailure', so I'll look at that.
> SELECT -> user_id, -> event_editingSessionId AS session_id, -> event_editor AS editor, -> MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started, -> MAX(timestamp) AS session_ended, -> MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready, -> MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt, -> IF(SUM(event_action = 'abort') > 0, -> IF(SUM(`event_action.abort.type` = 'nochange') > 0, -> 'abort_nochange', -> 'abort' -> ), -> IF(SUM(event_action = 'saveSuccess') > 0, -> 'success', -> IF(SUM(event_action = 'saveFailure') > 0, -> 'failure', -> 'other' -> )) -> ) AS outcome -> FROM staging.ve2_pilot_users -> INNER JOIN log.Edit_11448630 ON -> wiki = 'enwiki' AND -> user_id = `event_user.id` -> WHERE timestamp BETWEEN "2015052115" and "2015052215" -> GROUP BY user_id, session_id -> LIMIT 10; +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ | user_id | session_id | editor | session_started | session_ended | editor_ready | first_attempt | outcome | +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ | <snip> | 781557474d435416 | visualeditor | 20150521150126 | 20150521150601 | 20150521150126 | 20150521150500 | abort_nochange | | <snip> | 2c7ca865d011c19a6083c3363723f009 | wikitext | 20150521181116 | 20150521181130 | 20150521181130 | NULL | other | | <snip> | 027f8c94e370bf558a85873055bfcd12 | wikitext | 20150521184516 | 20150521184539 | 20150521184517 | NULL | abort_nochange | | <snip> | 0f2960a6aca5d9196d4627dcb35431b8 | wikitext | 20150521171238 | 20150521171304 | 20150521171239 | 20150521171304 | success | | <snip> | 0f36411d0d26712cbe43f2dc226b3a8a | wikitext | 20150521154401 | 20150521155150 | 20150521154402 | 20150521155150 | success | | <snip> | 0ffe919e0b3d1d54638dd80085ba9987 | wikitext | 20150521205239 | 20150521205248 | 20150521205241 | NULL | abort_nochange | | <snip> | 1a3e0b75a38bd8b5bce36acdb87da48d | wikitext | 20150521182659 | 20150521182934 | 20150521182701 | 20150521182858 | abort | | <snip> | 1bca0ecc5a3e9ac33aedc2ce07bb39ec | wikitext | 20150521161304 | 20150521161316 | 20150521161305 | NULL | abort_nochange | | <snip> | 1bcb04f925a1d54bef5d5ca3abf16a37 | wikitext | 20150521184624 | 20150521184641 | 20150521184625 | 20150521184641 | success | | <snip> | 2243957369cc389fec3cebc50c3fc7d6 | wikitext | 20150521152112 | 20150521152123 | 20150521152114 | NULL | abort_nochange | +----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+ 10 rows in set (17.10 sec)
Still seeing an "other" in the first 10. Let's look at that session.
> SELECT timestamp, event_action FROM Edit_11448630 WHERE event_editingSessionId = "2c7ca865d011c19a6083c3363723f009" AND wiki = "enwiki" and timestamp BETWEEN "20150521181116" AND "20150521181130"; +----------------+--------------+ | timestamp | event_action | +----------------+--------------+ | 20150521181116 | init | | 20150521181130 | ready | +----------------+--------------+ 2 rows in set (0.01 sec)
OK. So it looks like maybe the user closed their browser/tab too fast for us to record an abort type. I think it is safe to assume that sessions like this represent a "nochange". I'll label them differently though so we can check how often they happen later. So, here is my query:
SELECT
user_id,
event_editingSessionId AS session_id,
event_editor AS editor,
MAX(`event_page.revid`) AS rev_id,
MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
MAX(timestamp) AS session_ended,
MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
IF(SUM(event_action = 'abort') > 0,
IF(SUM(`event_action.abort.type` = 'nochange') > 0,
'abort_nochange',
'abort'
),
IF(SUM(event_action = 'saveSuccess') > 0,
'success',
IF(SUM(event_action = 'saveFailure') > 0,
'failure',
'other_abort'
))
) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
wiki = 'enwiki' AND
user_id = `event_user.id`
WHERE timestamp BETWEEN "2015052115" and "2015052215"
GROUP BY user_id, session_id;
$ head -n 3 pilot_user_editing_sessions.tsv; wc pilot_user_editing_sessions.tsv user_id session_id editor rev_id session_started session_ended editor_ready first_attempt outcome 25251662 781557474d435416 visualeditor 663414161 20150521150126 20150521150601 20150521150126 20150521150500 abort_nochange 25251680 2c7ca865d011c19a6083c3363723f009 wikitext 0 20150521181116 20150521181130 20150521181130 NULL other_abort 3377 30393 392381 pilot_user_editing_sessions.tsv
Cool. Now to get into R and do some analysis. --Halfak (WMF) (talk) 10:46, 28 May 2015 (UTC)
Analysis!
editSo, I want to make sure that nothing looks totally crazy with this data or that we're seeing a big surge in reverts or block rates for experimental users.
Oh! Wait! While I was looking at the data, I realized that I set the time-bounds on the bucketing dates, not the editors' experimental periods. Time to fix that! --Halfak (WMF) (talk) 11:02, 28 May 2015 (UTC)
Fixing the query timebounds.
editOK. I need to redo the queries above so that their time-bounds are based on the editors' registration dates.
- Blocks
SELECT
user_id,
IF(log_comment RLIKE "(S|s)pam", "spam",
IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism",
IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock",
IF(log_comment RLIKE "softerblock|soft block", "soft username",
IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username",
"other"
))))) AS type
FROM enwiki.logging
INNER JOIN enwiki.user ON
REPLACE(log_title, "_", " ") = user_name
INNER JOIN staging.ve2_pilot_users USING (user_id)
WHERE
log_type = "block" AND
log_action = "block" AND
log_timestamp BETWEEN
registration AND
DATE_FORMAT(
DATE_ADD(registration, INTERVAL 7 DAY),
"%Y%m%d%H%M%S"
);
- Editing sessions
SELECT
user_id,
event_editingSessionId AS session_id,
event_editor AS editor,
MAX(`event_page.revid`) AS rev_id,
MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
MAX(timestamp) AS session_ended,
MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
IF(SUM(event_action = 'abort') > 0,
IF(SUM(`event_action.abort.type` = 'nochange') > 0,
'abort_nochange',
'abort'
),
IF(SUM(event_action = 'saveSuccess') > 0,
'success',
IF(SUM(event_action = 'saveFailure') > 0,
'failure',
'other_abort'
))
) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
wiki = 'enwiki' AND
user_id = `event_user.id`
WHERE
timestamp BETWEEN
registration AND
DATE_FORMAT(
DATE_ADD(registration, INTERVAL 7 DAY),
"%Y%m%d%H%M%S"
)
GROUP BY user_id, session_id;
OK... Done --Halfak (WMF) (talk) 11:39, 28 May 2015 (UTC)
Analysis again!
editFirst, let's look at block rates. I define "blocked for damage" as either having the reason be "spam" or "vandalism". Really, all of the other blocks couldn't have anything to do with VE because they are either for sock puppets or something to do with the user name.
bucket via_mobile blocked.k blocked.p reverted.k reverted.p 1: experimental 0 67 0.04118009 177 0.10878918 2: control 0 77 0.04735547 152 0.09348093 3: control 1 8 0.01762115 52 0.11453744 4: experimental 1 7 0.01452282 57 0.11825726 blocked_for_damage.k blocked_for_damage.p n 1: 47 0.028887523 1627 2: 57 0.035055351 1626 3: 5 0.011013216 454 4: 4 0.008298755 482
It looks like, if anything, the experimental condition is seeing fewer blocks. Let's do a quick prop_test to be sure.
> prop.test(c(67, 77), c(1627, 1626)) 2-sample test for equality of proportions with continuity correction data: c(67, 77) out of c(1627, 1626) X-squared = 0.5944, df = 1, p-value = 0.4407 alternative hypothesis: two.sided 95 percent confidence interval: -0.020925457 0.008574682 sample estimates: prop 1 prop 2 0.04118009 0.04735547 > prop.test(c(47, 57), c(1627, 1626)) 2-sample test for equality of proportions with continuity correction data: c(47, 57) out of c(1627, 1626) X-squared = 0.8103, df = 1, p-value = 0.368 alternative hypothesis: two.sided 95 percent confidence interval: -0.018871933 0.006536278 sample estimates: prop 1 prop 2 0.02888752 0.03505535
Nope. No significance there. Now, how about the number of editors who submit an edit that gets reverted.
> prop.test(c(177, 152), c(1627, 1626)) 2-sample test for equality of proportions with continuity correction data: c(177, 152) out of c(1627, 1626) X-squared = 1.9314, df = 1, p-value = 0.1646 alternative hypothesis: two.sided 95 percent confidence interval: -0.006021796 0.036638291 sample estimates: prop 1 prop 2 0.10878918 0.09348093
Interestingly, it looks like we're getting close to significance here. We might find that editors who use VE are more likely to have at least one edit reverted. I'm curious whether this can be explained by VE enabling more newcomers to edit at all.
bucket via_mobile reverted.k editing.k main_editing.k 1: experimental 0 177 534 406 2: control 0 152 544 391 3: control 1 52 117 99 4: experimental 1 57 115 98 > prop.test(c(177, 152), c(406, 391)) 2-sample test for equality of proportions with continuity correction data: c(177, 152) out of c(406, 391) X-squared = 1.6421, df = 1, p-value = 0.2 alternative hypothesis: two.sided 95 percent confidence interval: -0.02356949 0.11799707 sample estimates: prop 1 prop 2 0.4359606 0.3887468
We certainly see less significance here. I'm not sure that we should expect this to be foreboding. Really, a p-value of 0.2 means that we'd expect to see this error in 1/5 trials if the real proportions were truly the same. Next up is edit success metrics. --Halfak (WMF) (talk) 11:58, 28 May 2015 (UTC)
Edit success
editTime to look at edit success rates. First things first, I must resample. The wikitext logging is sampled at a 25% rate, so I must do the same to visualeditor events. If I did this right and the presence of VE did not dramatically change the edit rates, then we should see balanced counts of events for either condition:
> ve_sessions = nrow(edit_sessions[editor == "visualeditor"]) > resampled_edit_sessions = rbind( + edit_sessions[editor == "wikitext",], + edit_sessions[editor == "visualeditor"][ + sample(ve_sessions, ve_sessions/4) + ] + ) > user_edit_sessions = merge(users, resampled_edit_sessions, by="user_id") > user_edit_sessions[, + list( + count = length(user_id) + ), + list(bucket) + ] bucket count 1: experimental 1467 2: control 1568
Well, that's close, but it's not even. We're definitely lacking events in the experimental condition. It could be that we have some users dominating the set though.
> user_edit_sessions[bucket=="experimental", + list( + count = length(bucket) + ), + list(user_id) + ][order(count, decreasing=T)][1:10] user_id count 1: 25252202 35 2: 25259984 25 3: 25256834 23 4: 25260014 23 5: 25252274 20 6: 25257424 15 7: 25257522 15 8: 25260224 15 9: 25252452 14 10: 25255896 14 > > user_edit_sessions[bucket=="control", + list( + count = length(bucket) + ), + list(user_id) + ][order(count, decreasing=T)][1:10] user_id count 1: 25255749 74 2: 25252589 41 3: 25259069 40 4: 25259511 25 5: 25258777 23 6: 25256155 21 7: 25254795 20 8: 25251911 18 9: 25251689 17 10: 25253889 15
Yup. that's definitely true. It seems like it would be best if I could look only at up-to the first <some small n> recorder edit sessions for each user. Meeting now. back later. --Halfak (WMF) (talk) 13:26, 28 May 2015 (UTC)
OK back. So, I worked out a simple way to filter the dataset down to the first few observations per user.
> first_5_sessions = ddply( + user_edit_sessions, + .(user_id), + function(sub_dt){ + sub_dt[order(sub_dt$session_started),][1:min(5, nrow(sub_dt)),] + } + ) > dim(first_5_sessions) [1] 2373 13 > dim(user_edit_sessions) [1] 3035 13
We lost about 650 observations. No biggie.
> first_5_sessions[, + list( + count = length(user_id) + ), + list(bucket) + ] bucket count 1: experimental 1185 2: control 1188
There we go! OK. Now to observe the differences.
bucket via_mobile users.n ve.k ve.p attempted.k attempted.p 1: experimental 0 582 404 0.355008787 425 0.3734622 2: control 0 586 4 0.003502627 523 0.4579685 3: experimental 1 28 9 0.191489362 11 0.2340426 4: control 1 23 0 0.000000000 13 0.2826087 successful.k successful.p changed.n n 1: 389 0.3418278 736 1138 2: 464 0.4063047 726 1142 3: 10 0.2127660 27 47 4: 13 0.2826087 40 46 > prop.test(c(425,523), c(736, 726)) 2-sample test for equality of proportions with continuity correction data: c(425, 523) out of c(736, 726) X-squared = 32.1325, df = 1, p-value = 1.44e-08 alternative hypothesis: two.sided 95 percent confidence interval: -0.19267500 -0.09320505 sample estimates: prop 1 prop 2 0.5774457 0.7203857 > prop.test(c(425,523), c(1138, 1142)) 2-sample test for equality of proportions with continuity correction data: c(425, 523) out of c(1138, 1142) X-squared = 16.4114, df = 1, p-value = 5.098e-05 alternative hypothesis: two.sided 95 percent confidence interval: -0.12569305 -0.04331948 sample estimates: prop 1 prop 2 0.3734622 0.4579685
Here, it looks like we are seeing significantly lower success rates for experimental users than control users -- and it doesn't matter if we filter out nochange aborts or not. This is *very* surprising since we see the opposite for the VE dashboard. We also do not see a reduced amount of edits coming from this group (see my analysis from yesterday). I don't think this is a show stopper, but it is something that we'll need to look into.
Oh! I have an idea. I'll add a new outcome type for switching between the editors. That's captured in the aborts as "switchwith" and "switchwithout" (see Schema:Edit). So, now to update the query:
SELECT
user_id,
event_editingSessionId AS session_id,
event_editor AS editor,
MAX(`event_page.revid`) AS rev_id,
MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
MAX(timestamp) AS session_ended,
MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
IF(SUM(event_action = 'abort') > 0,
IF(SUM(`event_action.abort.type` = 'nochange') > 0,
'abort_nochange',
IF(SUM(`event_action.abort.type` IN ('switchwith', 'switchwithout')) > 0,
'switch_editors',
'abort'
)),
IF(SUM(event_action = 'saveSuccess') > 0,
'success',
IF(SUM(event_action = 'saveFailure') > 0,
'failure',
'other_abort'
))
) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
wiki = 'enwiki' AND
user_id = `event_user.id`
WHERE
timestamp BETWEEN "2015052115" and "2015052915" AND
timestamp BETWEEN
registration AND
DATE_FORMAT(
DATE_ADD(registration, INTERVAL 7 DAY),
"%Y%m%d%H%M%S"
)
GROUP BY user_id, session_id;
> first_5_sessions[, + list( + users.n = length(unique(user_id)), + ve.k = sum(editor == "visualeditor"), + ve.p = sum(editor == "visualeditor")/length(session_id), + attempted.k = sum(outcome == "success" | !is.na(first_attempt)), + attempted.p = sum(outcome == "success" | !is.na(first_attempt))/length(session_id), + successful.k = sum(outcome == "success"), + successful.p = sum(outcome == "success")/length(session_id), + changed_and_noswitch.n = sum(outcome != "abort_nochange" & outcome != "switch_editors"), + changed.n = sum(outcome != "abort_nochange"), + n = length(session_id) + ), + list(bucket, via_mobile) + ] bucket via_mobile users.n ve.k ve.p attempted.k attempted.p 1: experimental 0 558 412 0.361720808 413 0.3625988 2: control 0 585 5 0.004366812 525 0.4585153 3: experimental 1 28 7 0.159090909 9 0.2045455 4: control 1 23 0 0.000000000 13 0.2826087 successful.k successful.p changed_and_noswitch.n changed.n n 1: 378 0.3318701 668 732 1139 2: 466 0.4069869 728 728 1145 3: 8 0.1818182 23 24 44 4: 13 0.2826087 40 40 46 > prop.test(c(413, 525), c(668, 728)) 2-sample test for equality of proportions with continuity correction data: c(413, 525) out of c(668, 728) X-squared = 16.2657, df = 1, p-value = 5.505e-05 alternative hypothesis: two.sided 95 percent confidence interval: -0.1535023 -0.0522784 sample estimates: prop 1 prop 2 0.6182635 0.7211538
Yup. Definitely a difference here. It looks like newcomers are just simply less likely to save an edit that they start when VE is enabled by default. Gonna have to talk this one out with Abbey. --Halfak (WMF) (talk) 15:20, 28 May 2015 (UTC)
I wanted to check something. Which editor makes up the sessions that have such a low attempt-rate? It turns out that it is the experimental condition sessions with 'wikitext'!
> first_5_sessions[, + list( + users.n = length(unique(user_id)), + attempted.k = sum(outcome == "success" | !is.na(first_attempt)), + attempted.p = sum(outcome == "success" | !is.na(first_attempt))/length(session_id), + changed_and_noswitch.n = sum(outcome != "abort_nochange" & outcome != "switch_editors"), + changed.n = sum(outcome != "abort_nochange"), + n = length(session_id) + ), + list(via_mobile, editor=as.character(editor), bucket) + ][order(via_mobile, editor, bucket)] via_mobile editor bucket users.n attempted.k attempted.p 1: 0 visualeditor control 5 6 0.6666667 2: 0 visualeditor experimental 241 177 0.4609375 3: 0 wikitext control 584 520 0.4561404 4: 0 wikitext experimental 421 226 0.3079019 5: 1 visualeditor experimental 5 3 0.4285714 6: 1 wikitext control 23 13 0.2826087 7: 1 wikitext experimental 25 7 0.1891892 changed_and_noswitch.n changed.n n 1: 7 7 9 2: 260 287 384 3: 723 723 1140 4: 392 421 734 5: 4 4 7 6: 40 40 46 7: 18 19 37
So, the hypothesis that visualeditor makes editors less likely to successfully save an edit seems to not describe what's actually going on. I that what we are seeing here is simply experimentation with the two editors. --Halfak (WMF) (talk) 15:42, 28 May 2015 (UTC)
Editing users per user with logged edit session
editI wanted to confirm that the assumptions I made earlier by checking of the proportion of users who have at least one edit against the proportion of users with at least one edit session.
> merge( + users, + merge(metrics, user_session_count, by="user_id", all=T), + by="user_id" + )[, + list( + has_editing_sessions.k = sum(!is.na(editing_sessions) | day_revisions), + editing.n = sum(day_revisions > 0) + ), + list(bucket, via_mobile) + ] bucket via_mobile has_editing_sessions.k editing.n 1: experimental 0 773 534 2: control 0 774 544 3: control 1 128 117 4: experimental 1 133 115
And it looks like we have reasonable numbers here. Nothing to get scared about anyway. --Halfak (WMF) (talk) 20:22, 28 May 2015 (UTC)