Research talk:STiki 1 million reverts review/Work log/2016-11-02
Wednesday, November 2, 2016
editHey folks! Just sitting down with this dataset. *cracks knuckles* Let's have a quick look!
> sum(mr$reverts) [1] 895344
Hmm.. Looks like I'm missing about 100k reverts. I wonder why that is. maybe my regex has some imperfections. I'll have to consult West.andrew.g.
- Took a look at your SQL query. Seems like you are trying to glean STiki usage purely from edit comments? I'm sure most of the missing reverts are from the early days before we converged on the current format. The code wasn't even under version control in those early days. I can figure it out some of the old strings, though, by joining the STiki feedback table against a metadata one to get some of those older comments. I imagine it will be of most benefit to put critical STiki tables onto WMF infrastructure to do quick joins on RIDs. West.andrew.g (talk) 04:00, 2 November 2016 (UTC)
Anyway, forward!
So, first I want to look at how STiki's usage has been changing over time. There's some yearly periods here so I'll break down the graphs by year and month.
We can definitely see a dip in may/jun which is probably all the vandals going on summer vacation ;) I'm surprised not to see the same pattern in 2016 though. OK next I want to look at the proportion of reverts that were of anons' edits.
Here, we can see a general decline in the overall proportion of anons who were reverted (compared to registered editors). When STiki first came out, it was 100% anons, then it dropped to about 90% anons in 2011. Then midway through 2013, we see it drop again to the low 80%s. I don't know what to think of the two dips in 2014 and the beginning of 2015. It's more than one month that shows that dip, so it seems like it might be real.
- I'll pull the table description text and "STiki timeline" I started into this work log. I think if I curate that timeline well enough, it will begin to answer these questions and additional variables we might want to isolate. I know my very first interface didn't process edits by registered users (thus 100% anon). At some point the interface shifted its default queue from my metadata model to that of CBNG. We might assume their model penalizes anons less and relies more on edit language. The queue an RID was chosen from is annotated in STiki's tables. West.andrew.g (talk) 04:27, 2 November 2016 (UTC)
OK next I was curious about the rate at which reverts were flagged as "good-faith".
It's interesting that registered editor reverts get flagged as "good-faith" at about a 10% higher rate and that remains consistent over time. It's also really interesting that the proportions track pretty closely.
OK. One last thing. Let me link to the dataset! https://github.com/halfak/STiki-revert-analysis/blob/master/datasets/enwiki.monthly_stiki_reverts.tsv
That's all for today. I'll have a good think about all this and regroup with some new analysis soon. --EpochFail (talk) 01:03, 2 November 2016 (UTC)