Research talk:Wikipedia article creation/Work log/Tuesday, December 10th
Latest comment: 11 years ago by Halfak (WMF) in topic Tuesday, December 10th
Tuesday, December 10th
editI'm rushing to capture my progress today. I was working on metrics standardization stuff this morning, so I got a late start today.
First, I want to know when page moves started to be recorded in revision comments.
> select left(timestamp, 4) AS year, count(*) FROM nov13_move group by 1; +------+----------+ | year | count(*) | +------+----------+ | 2005 | 15907 | | 2006 | 517014 | | 2007 | 554246 | | 2008 | 565643 | | 2009 | 585088 | | 2010 | 534849 | | 2011 | 551479 | | 2012 | 85404 | +------+----------+ 8 rows in set (7.43 sec)
It looks like it started before 2008 which is all I really care about. Next up, I want to use this move table to track where pages originally started.
SELECT
page.page_id,
page.page_namespace,
page.page_title,
IFNULL(first_move.from_namespace, page.page_namespace) AS original_namespace,
IFNULL(first_move.from_title, page.page_title) AS original_title
FROM halfak.nov13_page AS page
LEFT JOIN (
SELECT
page_id,
page_namespace,
page_title,
MIN(rev_id) AS rev_id
FROM halfak.nov13_move
GROUP BY 1,2,3) AS first_move_id USING (page_id, page_namespace, page_title)
LEFT JOIN halfak.nov13_move first_move USING (rev_id);
Yuck. I just figured out that I wasn't properly extracting page namespace from the move titles. I just restarted that. Hopefully, I'll have my original titles and namespaces ready within an hour. --Halfak (WMF) (talk) 23:02, 10 December 2013 (UTC)