Research talk:Revision scoring as a service/Work log/2015-07-22
Latest comment: 9 years ago by EpochFail in topic Wednesday, July 22, 2015
Wednesday, July 22, 2015
editWhite Cat asked me to check how many task lacked any labels for our ongoing Wiki labels campaigns.
wikilabels=> SELECT campaign.id, wiki, COUNT(*) FROM campaign INNER JOIN task ON campaign_id = campaign.id LEFT JOIN label ON task_id = task.id WHERE task_id IS NULL GROUP BY campaign.id, wiki; id | wiki | count ----+--------+------- 4 | enwiki | 602 9 | frwiki | 19949 6 | fawiki | 261 3 | ptwiki | 4 5 | trwiki | 1546 8 | azwiki | 20000 7 | ptwiki | 1058 (7 rows)
It looks like we have a lot of duplicate labels for enwiki due to running the auto-labeling after people got started with labeling.
Let's check how much energy we wasted (also autolabeling we'll be able to check).
wikilabels=> SELECT wiki, SUM(CAST(labels > 1 AS INT)) FROM (SELECT wiki, task_id, COUNT(label.*) AS labels FROM campaign INNER JOIN task ON campaign_id = campaign.id LEFT JOIN label ON task_id = task.id WHERE task_id IS NOT NULL GROUP BY wiki, task_id) AS foo GROUP BY wiki; wiki | sum --------+----- enwiki | 702 fawiki | 187 frwiki | 0 ptwiki | 285 trwiki | 57 (5 rows)
So in enwiki, we got 702 human labels that we didn't need to finish the campaign due to autolabeling. --EpochFail (talk) 14:52, 22 July 2015 (UTC)