Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-06-18
Wednesday, June 18th
editToday I get to work on the funnel. Woo. I think I'll start by looking at this figure:
To help me think about this, I'm going to talk through the diagram. All flows start with clicking edit. In the pre-edit case, we expect an impression and potentially a button click. Next we might see either an account creation impression/creation or a revision saved. Now this is where the post-edit case does it's interesting stuff. We expect that an impression will happen shortly after the saved edit and potentially a button click. This button click might lead to an account creation impression/creation and that's the end.
OK. So I think that my plan is to identify a set of "flows" that start with edit link clicks and show the proportion of those flows that travel the various paths through the diagram above. One potential problem with assuming that every edit lick click is a flow start is that an edit link click is an acceptable action when presented with the CTA. I'm going to have to set some threshold for when I'll consider an edit link click to be the start of a flow. --Halfak (WMF) (talk) 15:01, 18 June 2014 (UTC)
CREATE TEMPORARY TABLE staging.token_edit_clicks
SELECT
wiki,
event_token AS token,
timestamp
FROM log.SignupExpPageLinkClick_8101692
WHERE
event_link LIKE "edit%" AND
timestamp BETWEEN "20140519180800" AND "20140526180800";
ALTER TABLE staging.token_edit_clicks MODIFY wiki VARCHAR(50);
CREATE INDEX wiki_token ON staging.token_edit_clicks (wiki, token);
CREATE TEMPORARY TABLE staging.token_edit_clicks_copy
SELECT * FROM staging.token_edit_clicks;
CREATE TEMPORARY TABLE staging.token_flow_start_pre
SELECT
tec.wiki,
tec.token,
tec.timestamp
FROM staging.token_edit_clicks_copy tec
LEFT JOIN staging.token_edit_clicks recent_clicks ON
tec.wiki = recent_clicks.wiki AND
tec.token = recent_clicks.token AND
recent_clicks.timestamp BETWEEN
DATE_FORMAT(DATE_SUB(tec.timestamp, INTERVAL 5 MINUTE), "%Y%m%d%H%i%S") AND
DATE_FORMAT(DATE_SUB(tec.timestamp, INTERVAL 1 SECOND), "%Y%m%d%H%i%S")
WHERE recent_clicks.token IS NULL;
CREATE INDEX wiki_token_pre ON staging.token_flow_start_pre (wiki, token);
CREATE TEMPORARY TABLE staging.token_flow_start_pre_copy
SELECT * FROM staging.token_flow_start_pre;
CREATE TABLE staging.token_flow_start
SELECT
tfs.wiki,
tfs.token,
tfs.timestamp,
MIN(next_flow.timestamp) AS next_flow_start
FROM staging.token_flow_start_pre_copy tfs
LEFT JOIN staging.token_flow_start_pre next_flow ON
tfs.wiki = next_flow.wiki AND
tfs.token = next_flow.token AND
next_flow.timestamp > tfs.timestamp
GROUP BY 1,2,3;
CREATE INDEX wiki_token ON staging.token_flow_start (wiki, token);
> select count(*) from staging.token_flow_start; +----------+ | count(*) | +----------+ | 791866 | +----------+ 1 row in set (0.00 sec)
There. Now I have a table with 791,866 clicks on edit that started a flow. Now, all I need to do is get a count of unique events for each token flow and I can start plotting. --Halfak (WMF) (talk) 15:56, 18 June 2014 (UTC)
Now the query to gather all of the first timestamps of each event type within a flow.
CREATE TEMPORARY TABLE staging.experimental_wiki_token_events
SELECT
wiki,
event_token AS token,
timestamp,
"revision" AS event
FROM log.TrackedPageContentSaveComplete_8535426
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
"revision" AS event
FROM log.TrackedPageContentSaveComplete_7872558
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
"creation complete" AS event
FROM log.SignupExpAccountCreationComplete_8539421
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
"creation impression" AS event
FROM log.SignupExpAccountCreationImpression_8539445
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("button click ", event_button) AS event
FROM log.SignupExpCTAButtonClick_8102619
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("CTA impression ", event_cta) AS event
FROM log.SignupExpCTAImpression_8101716
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("link click ", event_link) AS event
FROM log.SignupExpPageLinkClick_8101692
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL;
ALTER TABLE staging.experimental_wiki_token_events MODIFY wiki VARCHAR(50);
CREATE INDEX wiki_token_time ON staging.experimental_wiki_token_events (wiki, token, timestamp);
SELECT
tfs.wiki,
tfs.token,
tfs.timestamp AS flow_start,
MAX(ewte.timestamp) AS last_event,
MIN(IF(ewte.event = "link click edit section", ewte.timestamp, NULL)) AS first_edit_section_click,
SUM(ewte.event = "link click edit section") AS edit_section_clicks,
MIN(IF(ewte.event = "link click edit page", ewte.timestamp, NULL)) AS first_edit_page_click,
SUM(ewte.event = "link click edit page") AS edit_page_clicks,
MIN(IF(ewte.event = "link click create account", ewte.timestamp, NULL)) AS first_create_account_click,
SUM(ewte.event = "link click create account") AS create_account_clicks,
MIN(IF(ewte.event = "CTA impression pre-edit", ewte.timestamp, NULL)) AS first_pre_edit_cta_impression,
SUM(ewte.event = "CTA impression pre-edit") AS pre_edit_cta_impressions,
MIN(IF(ewte.event = "CTA impression post-edit", ewte.timestamp, NULL)) AS first_post_edit_cta_impression,
SUM(ewte.event = "CTA impression post-edit") AS post_edit_cta_impressions,
MIN(IF(ewte.event = "button click edit", ewte.timestamp, NULL)) AS first_edit_button_click,
SUM(ewte.event = "button click edit") AS edit_button_clicks,
MIN(IF(ewte.event = "button click signup", ewte.timestamp, NULL)) AS first_signup_button_click,
SUM(ewte.event = "button click signup") AS signup_button_clicks,
MIN(IF(ewte.event = "button click dismiss", ewte.timestamp, NULL)) AS first_dismiss_button_click,
SUM(ewte.event = "button click dismiss") AS dismiss_button_clicks,
MIN(IF(ewte.event = "creation impression", ewte.timestamp, NULL)) AS first_account_creation_impression,
SUM(ewte.event = "creation impression") AS account_creation_impressions,
MIN(IF(ewte.event = "creation complete", ewte.timestamp, NULL)) AS first_account_creation_complete,
SUM(ewte.event = "creation complete") AS account_creation_completes,
MIN(IF(ewte.event = "revision", ewte.timestamp, NULL)) AS first_revision_saved,
SUM(ewte.event = "revision") AS revisions_saved
FROM staging.token_flow_start tfs
LEFT JOIN staging.experimental_wiki_token_events ewte ON
tfs.wiki = ewte.wiki AND
tfs.token = ewte.token AND
ewte.timestamp BETWEEN
tfs.timestamp AND
IFNULL(next_flow_start, DATE_FORMAT(DATE_ADD(ewte.timestamp, INTERVAL 10 MINUTE), "%Y%m%d%H%i%S"))
GROUP BY 1,2,3;
--Halfak (WMF) (talk) 18:52, 18 June 2014 (UTC)
Got it. I'll be picking this up tomorrow:
wiki token flow_start dewiki:107935 cDhV7L1uEBDZkN8lzisckNgGopYwm6KI: 194 Min. :2.014e+13 enwiki:539982 cnkoRoKZHR1xutv11c4jkE1ozNMy9QwU: 155 1st Qu.:2.014e+13 frwiki: 85553 b9hJUP4de7uGJvDJxveDgx456ruaSolo: 142 Median :2.014e+13 itwiki: 58392 m0sxhkCF1X4MXdH4aS2kKYyAj8SFG3qs: 129 Mean :2.014e+13 wiki : 4 8iXmZJIUV0BdGMlPro4b6h4e8qLgLGQ5: 127 3rd Qu.:2.014e+13 SqkExHP8DGdGfC6xq8EvEC29liI54ttv: 108 Max. :2.014e+13 (Other) :791011 last_event first_edit_section_click edit_section_clicks Min. :2.014e+13 Min. :2.014e+13 Min. : 0.0000 1st Qu.:2.014e+13 1st Qu.:2.014e+13 1st Qu.: 0.0000 Median :2.014e+13 Median :2.014e+13 Median : 1.0000 Mean :2.014e+13 Mean :2.014e+13 Mean : 0.8334 3rd Qu.:2.014e+13 3rd Qu.:2.014e+13 3rd Qu.: 1.0000 Max. :2.014e+13 Max. :2.014e+13 Max. :2061.0000 NA's :4 NA's :263461 NA's :4 first_edit_page_click edit_page_clicks first_create_account_click Min. :2.014e+13 Min. : 0.0000 Min. :2.014e+13 1st Qu.:2.014e+13 1st Qu.: 0.0000 1st Qu.:2.014e+13 Median :2.014e+13 Median : 0.0000 Median :2.014e+13 Mean :2.014e+13 Mean : 0.5213 Mean :2.014e+13 3rd Qu.:2.014e+13 3rd Qu.: 1.0000 3rd Qu.:2.014e+13 Max. :2.014e+13 Max. :351.0000 Max. :2.014e+13 NA's :495800 NA's :4 NA's :784576 create_account_clicks first_pre_edit_cta_impression pre_edit_cta_impressions Min. : 0.00000 Min. :2.014e+13 Min. : 0.0000 1st Qu.: 0.00000 1st Qu.:2.014e+13 1st Qu.: 0.0000 Median : 0.00000 Median :2.014e+13 Median : 0.0000 Mean : 0.01129 Mean :2.014e+13 Mean : 0.3871 3rd Qu.: 0.00000 3rd Qu.:2.014e+13 3rd Qu.: 1.0000 Max. :70.00000 Max. :2.014e+13 Max. :112.0000 NA's :4 NA's :539853 NA's :4 first_post_edit_cta_impression post_edit_cta_impressions Min. :2.014e+13 Min. : 0.00000 1st Qu.:2.014e+13 1st Qu.: 0.00000 Median :2.014e+13 Median : 0.00000 Mean :2.014e+13 Mean : 0.08129 3rd Qu.:2.014e+13 3rd Qu.: 0.00000 Max. :2.014e+13 Max. :123.00000 NA's :751294 NA's :4 first_edit_button_click edit_button_clicks first_signup_button_click Min. :2.014e+13 Min. : 0.00000 Min. :2.014e+13 1st Qu.:2.014e+13 1st Qu.: 0.00000 1st Qu.:2.014e+13 Median :2.014e+13 Median : 0.00000 Median :2.014e+13 Mean :2.014e+13 Mean : 0.07747 Mean :2.014e+13 3rd Qu.:2.014e+13 3rd Qu.: 0.00000 3rd Qu.:2.014e+13 Max. :2.014e+13 Max. :75.00000 Max. :2.014e+13 NA's :746664 NA's :4 NA's :770233 signup_button_clicks first_dismiss_button_click dismiss_button_clicks Min. : 0.00000 Min. :2.014e+13 Min. : 0.0000 1st Qu.: 0.00000 1st Qu.:2.014e+13 1st Qu.: 0.0000 Median : 0.00000 Median :2.014e+13 Median : 0.0000 Mean : 0.02885 Mean :2.014e+13 Mean : 0.2064 3rd Qu.: 0.00000 3rd Qu.:2.014e+13 3rd Qu.: 0.0000 Max. :14.00000 Max. :2.014e+13 Max. :728.0000 NA's :4 NA's :644257 NA's :4 first_account_creation_impression account_creation_impressions Min. :2.014e+13 Min. : 0.0000 1st Qu.:2.014e+13 1st Qu.: 0.0000 Median :2.014e+13 Median : 0.0000 Mean :2.014e+13 Mean : 0.0763 3rd Qu.:2.014e+13 3rd Qu.: 0.0000 Max. :2.014e+13 Max. :345.0000 NA's :758648 NA's :4 first_account_creation_complete account_creation_completes Min. :2.014e+13 Min. : 0.00000 1st Qu.:2.014e+13 1st Qu.: 0.00000 Median :2.014e+13 Median : 0.00000 Mean :2.014e+13 Mean : 0.01617 3rd Qu.:2.014e+13 3rd Qu.: 0.00000 Max. :2.014e+13 Max. :73.00000 NA's :779901 NA's :4 first_revision_saved revisions_saved Min. :2.014e+13 Min. : 0.000 1st Qu.:2.014e+13 1st Qu.: 0.000 Median :2.014e+13 Median : 0.000 Mean :2.014e+13 Mean : 0.956 3rd Qu.:2.014e+13 3rd Qu.: 0.000 Max. :2.014e+13 Max. :4122.000 NA's :641494 NA's :4