Research talk:Automated classification of draft quality/Work log/2016-09-28
Latest comment: 8 years ago by EpochFail in topic Wednesday, September 28, 2016
Wednesday, September 28, 2016
editOK looks like my last queries have failed, so I'm going to run them directly on labsDB in PAWS.
https://quarry.wmflabs.org/query/12795 -- Surviving articles https://quarry.wmflabs.org/query/12796 -- Deleted articles
--EpochFail (talk) 17:26, 28 September 2016 (UTC)
Here's what I've got:
SELECT
page_title,
rev_id AS creation_rev_id,
rev_timestamp AS creation_timestamp,
FALSE AS archived,
"OK" AS draft_quality
FROM revision
INNER JOIN page ON
rev_page = page_id
WHERE
rev_timestamp BETWEEN "20150827" AND "20160827" AND
rev_parent_id = 0 AND
page_namespace = 0
UNION ALL
SELECT
ar_title,
ar_rev_id AS creation_rev_id,
ar_timestamp AS creation_timestamp,
True AS archived,
IF(log_comment REGEXP "WP:CSD#G3\\|", "vandalism",
IF(log_comment REGEXP "WP:CSD#G10\\|", "attack",
IF(log_comment REGEXP "WP:CSD#G11\\|", "spam",
IF(log_comment REGEXP "WP:CSD#A11\\|", "hoax", "OK")))) AS draft_quality
FROM archive
LEFT JOIN logging speedy_delete ON
log_namespace = ar_namespace AND
log_title = ar_title AND
log_type = "delete" AND
log_action = "delete" AND
log_comment LIKE "[[WP:CSD#%" AND
log_comment REGEXP "WP:CSD#(G3|G10|G11|A11)\\|" AND
log_timestamp > ar_timestamp
WHERE
ar_timestamp BETWEEN "20150827" AND "20160827" AND
log_timestamp BETWEEN "20150827" AND "20160827" AND
ar_parent_id = 0 AND
ar_namespace = 0;
It should theoretically run on labsDB, but I think it might have more of a chance to finish, so I'll be running on both labs and analytics databases. --EpochFail (talk) 17:34, 28 September 2016 (UTC)
Well, that's still running. So I guess I'll just let it go overnight and see what we've got in the morning. --EpochFail (talk) 22:33, 28 September 2016 (UTC)