Research talk:Wikipedia article creation/Work log/Friday, November 15th
Friday, November 15th
editI don't have much time to hack on this today. I've been waiting for one of my queries to finish. This is the one that gathers all of the pages that *have not* been deleted along with the rev_id of their first edit. Instead of waiting on that to finish, I decided to split up the work.
First, I'll get all of the page stats from the revision table:
CREATE TABLE halfak.nov13_page_revision_stats
SELECT
rev_page AS page_id,
MIN(rev_id) AS first_rev_id,
COUNT(*) AS revisions,
MIN(rev_timestamp) AS first_edit,
MAX(rev_timestamp) AS last_edit
FROM revision
WHERE rev_timestamp < "20131105000000"
GROUP BY 1
Then I'll join it with page to get non-archived pages with stats:
SELECT
page_id,
page_namespace,
page_title,
revisions,
first_edit,
last_edit,
False AS archived,
first_rev_id
FROM halfak.page_revision_stats
INNER JOIN page USING (page_id)
WHERE rev_timestamp < "20131105000000"
GROUP BY 1,2,3;
I've kicked the query off so, hopefully, it will be done by the end of the day.