Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-12
Friday, June 12, 2015
editObservation period complete. Starting my final analysis. First, the measures of productivity and survival.
bucket | via_mobile | editing.k | week_editing.k | main_editing.k | week_main_editing.k | talk_editing.k | week_talk_editing.k | user_editing.k | week_user_editing.k | wp_editing.k | week_wp_editing.k | productive.k | week_productive.k | surviving.k | gt_one_hour.k | enabled.k | n |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
control | 1 | 927 | 953 | 794 | 811 | 85 | 94 | 126 | 144 | 15 | 17 | 413 | 435 | 71 | 57 | 6 | 3670 |
experimental | 0 | 3237 | 3386 | 2387 | 2502 | 383 | 439 | 812 | 904 | 106 | 133 | 1659 | 1778 | 287 | 338 | 9693 | 9728 |
control | 0 | 3211 | 3363 | 2448 | 2551 | 425 | 499 | 778 | 876 | 86 | 111 | 1671 | 1772 | 287 | 343 | 110 | 9794 |
experimental | 1 | 949 | 979 | 817 | 840 | 93 | 101 | 151 | 165 | 14 | 16 | 430 | 446 | 58 | 52 | 3775 | 3779 |
OK. Time to look for significance.
24h
> prop.test(c(1659, 1671), c(9728, 9794)) ... X-squared = 0, df = 1, p-value = 1
Well, that's about as insignificant as it can get.
Full week
> prop.test(c(1778, 1772), c(9728, 9794)) ... X-squared = 0.0995, df = 1, p-value = 0.7524
Same. If there is any difference, it's too small to measure with 20k observations.
Before we move on, let's measure the total number of productive edits and see if there's a difference there.
> with( + user_metrics, + wilcox.test( + day_productive_edits[bucket == "control"], + day_productive_edits[bucket == "experimental"] + ) + ) Wilcoxon rank sum test with continuity correction data: day_productive_edits[bucket == "control"] and day_productive_edits[bucket == "experimental"] W = 91088984, p-value = 0.6909 alternative hypothesis: true location shift is not equal to 0 > with( + user_metrics, + wilcox.test( + week_productive_edits[bucket == "control"], + week_productive_edits[bucket == "experimental"] + ) + ) Wilcoxon rank sum test with continuity correction data: week_productive_edits[bucket == "control"] and week_productive_edits[bucket == "experimental"] W = 91023213, p-value = 0.8194 alternative hypothesis: true location shift is not equal to 0
With p-values around .69 and .82, we're not seeing any difference we can say is real.
- Survival
> prop.test(c(287, 287), c(9728, 9794)) ... X-squared = 0.0016, df = 1, p-value = 0.9682
No significant difference there either. --Halfak (WMF) (talk) 19:27, 12 June 2015 (UTC)
Burden
editNow to look into changes in burden.
bucket | via_mobile | blocked.k | blocked.p | reverted.k | reverted.p | blocked_for_damage.k | blocked_for_damage.p | n |
---|---|---|---|---|---|---|---|---|
control | 1 | 92 | 0.02506812 | 475 | 0.1294278 | 57 | 0.01553134 | 3670 |
experimental | 0 | 406 | 0.0417352 | 919 | 0.09446957 | 259 | 0.02662418 | 9728 |
control | 0 | 415 | 0.04237288 | 1001 | 0.1022054 | 290 | 0.02960997 | 9794 |
experimental | 1 | 81 | 0.02143424 | 474 | 0.12543 | 66 | 0.01746494 | 3779 |
- Reverts
First, let's do reverted edits per editor. We'll use the wilcoxon test again.
> with( + user_metrics, + wilcox.test( + day_reverted_main_revisions[bucket == "control"], + day_reverted_main_revisions[bucket == "experimental"] + ) + ) Wilcoxon rank sum test with continuity correction data: day_reverted_main_revisions[bucket == "control"] and day_reverted_main_revisions[bucket == "experimental"] W = 91583600, p-value = 0.05568 alternative hypothesis: true location shift is not equal to 0
Wow. Marginal significance here. Rigor tells us we can't believe this result. Either there is a real, but very small effect or non at all. However, we can still talk about the potential implications. Let's say that this result is real. That would mean that current Wikipedians need to revert slightly fewer revisions when VE is enabled than when it is not.
Oops! I forgot to filter out editors who registered via mobile.
> with( + user_block_metrics, + wilcox.test( + day_reverted_main_revisions[!via_mobile & bucket == "control"], + day_reverted_main_revisions[!via_mobile & bucket == "experimental"] + ) + ) Wilcoxon rank sum test with continuity correction data: day_reverted_main_revisions[!via_mobile & bucket == "control"] and day_reverted_main_revisions[!via_mobile & bucket == "experimental"] W = 48045048, p-value = 0.04534 alternative hypothesis: true location shift is not equal to 0
So, it looks like we have crossed the significance threshold, but even if this is, in fact, a real effect, it's very very small.
- Block rate
Next, we're going to look at the proportion of users who are blocked for spam/vandalism
> prop.test(c(259, 290), c(9728, 9794)) 2-sample test for equality of proportions with continuity correction data: c(259, 290) out of c(9728, 9794) X-squared = 1.4845, df = 1, p-value = 0.2231 alternative hypothesis: two.sided 95 percent confidence interval: -0.007725427 0.001753852 sample estimates: prop 1 prop 2 0.02662418 0.02960997
VE enabled users get slightly fewer spam/vandalism blocks, but there's no significance here. --Halfak (WMF) (talk) 20:12, 12 June 2015 (UTC)