Research talk:Wikipedia article creation/Work log/Friday, November 15th

Friday, November 15th

edit

I don't have much time to hack on this today. I've been waiting for one of my queries to finish. This is the one that gathers all of the pages that *have not* been deleted along with the rev_id of their first edit. Instead of waiting on that to finish, I decided to split up the work.

First, I'll get all of the page stats from the revision table:

CREATE TABLE halfak.nov13_page_revision_stats
SELECT
    rev_page AS page_id,
    MIN(rev_id) AS first_rev_id,
    COUNT(*) AS revisions,
    MIN(rev_timestamp) AS first_edit,
    MAX(rev_timestamp) AS last_edit
FROM revision
WHERE rev_timestamp < "20131105000000"
GROUP BY 1

Then I'll join it with page to get non-archived pages with stats:

SELECT
    page_id,
    page_namespace,
    page_title,
    revisions,
    first_edit,
    last_edit,
    False AS archived,
    first_rev_id
FROM halfak.page_revision_stats
INNER JOIN page USING (page_id)
WHERE rev_timestamp < "20131105000000"
GROUP BY 1,2,3;

I've kicked the query off so, hopefully, it will be done by the end of the day.

Return to "Wikipedia article creation/Work log/Friday, November 15th" page.