Research talk:Automated classification of article importance/Work log/2017-06-30
Friday, June 30, 2017
editToday I'll wrap up some documentation work, tie up loose ends, and do a it of additional gap analysis.
WikiProjects Quality/Importance Analysis
editSimilar to how we created confusion matrices based on predicted and actual importance, we can also make similar matrices based on predicted article quality and predicted/actual importance, for each of the WikiProjects we have studied. We use the Objective Revision Scoring Service to predict article quality, and we use the revision ID of the article at the time each WikiProject's dataset was gathered.
WikiProject Africa
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 14,697 | 1,140 | 214 | 136 |
Start | 7,198 | 1,412 | 419 | 436 |
C | 2,913 | 970 | 362 | 947 |
B | 473 | 240 | 123 | 394 |
GA | 675 | 250 | 93 | 232 |
FA | 195 | 92 | 53 | 120 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 10,757 | 3,574 | 1,552 | 304 |
Start | 3,659 | 2,800 | 2,204 | 802 |
C | 931 | 1,347 | 1,448 | 1,466 |
B | 156 | 258 | 337 | 479 |
GA | 186 | 305 | 347 | 412 |
FA | 32 | 85 | 134 | 209 |
WikiProject China
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 5,176 | 2,864 | 160 | 1 |
Start | 4,805 | 3,146 | 421 | 19 |
C | 2,576 | 2,409 | 633 | 191 |
B | 599 | 684 | 266 | 108 |
GA | 386 | 368 | 102 | 45 |
FA | 111 | 156 | 104 | 48 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 4,446 | 3,117 | 634 | 4 |
Start | 3,690 | 2,993 | 1,624 | 84 |
C | 1,590 | 1,633 | 2,074 | 512 |
B | 394 | 469 | 609 | 185 |
GA | 236 | 216 | 338 | 111 |
FA | 62 | 72 | 198 | 87 |
WikiProject Judaism
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 795 | 172 | 6 | 0 |
Start | 1,587 | 516 | 68 | 17 |
C | 1,099 | 523 | 224 | 96 |
B | 243 | 218 | 142 | 69 |
GA | 232 | 88 | 30 | 34 |
FA | 40 | 81 | 27 | 18 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 657 | 272 | 38 | 6 |
Start | 1,269 | 658 | 217 | 44 |
C | 706 | 575 | 473 | 188 |
B | 162 | 178 | 223 | 109 |
GA | 141 | 105 | 80 | 58 |
FA | 32 | 63 | 38 | 33 |
WikiProject Medicine
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 5,480 | 1,880 | 14 | 0 |
Start | 6,718 | 1,967 | 68 | 0 |
C | 4,849 | 2,721 | 313 | 2 |
B | 1,175 | 1,016 | 246 | 22 |
GA | 1,237 | 1,004 | 192 | 32 |
FA | 261 | 285 | 149 | 36 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 5,154 | 2,094 | 126 | 0 |
Start | 6,090 | 2,143 | 519 | 1 |
C | 4,156 | 2,214 | 1,486 | 29 |
B | 993 | 650 | 733 | 83 |
GA | 987 | 731 | 692 | 55 |
FA | 220 | 136 | 300 | 75 |
WikiProject National Football League
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 1,584 | 350 | 14 | 0 |
Start | 1,504 | 1,311 | 98 | 41 |
C | 1,032 | 881 | 248 | 170 |
B | 83 | 214 | 59 | 42 |
GA | 255 | 232 | 86 | 90 |
FA | 22 | 47 | 16 | 16 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 1,581 | 343 | 24 | 0 |
Start | 1,383 | 1,349 | 179 | 43 |
C | 821 | 829 | 492 | 189 |
B | 74 | 159 | 114 | 51 |
GA | 189 | 188 | 184 | 102 |
FA | 26 | 29 | 28 | 18 |
WikiProject Politics
editIn the first table below, columns are true importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 7,208 | 202 | 80 | 1 |
Start | 5,942 | 761 | 177 | 9 |
C | 3,156 | 1,465 | 398 | 32 |
B | 980 | 794 | 256 | 42 |
GA | 1,114 | 498 | 108 | 11 |
FA | 518 | 381 | 99 | 16 |
In this second table, columns are predicted importance ratings, rows are predicted article quality.
Low | Mid | High | Top | |
---|---|---|---|---|
Stub | 6,368 | 565 | 551 | 7 |
Start | 4,772 | 1,365 | 691 | 61 |
C | 1,531 | 2,065 | 1,284 | 171 |
B | 432 | 798 | 710 | 132 |
GA | 483 | 707 | 474 | 67 |
FA | 151 | 471 | 315 | 77 |
Correlations between quality and importance
editBoth ORES and our importance prediction model provides per-class probabilities, which we can utilize to understand the correlation between quality and importance. We apply an approach similar to that used by Aaron Halfaker for studying studying quality dynamics in Wikipedia, and by Sage Ross in FixMeBot and in the Wiki Education dashboard. For simplicity, we adopt Halfaker's approach and calculate an importance score as:
We then calculate the correlation coefficient between quality and importance for each of the WikiProjects, finding as follows:
Project name | Correlation |
---|---|
Africa | 0.561 |
China | 0.456 |
Judaism | 0.410 |
Medicine | 0.395 |
National Football League | 0.487 |
Politics | 0.519 |