Talk:Wiki labels/2015

Latest comment: 9 years ago by Halfak (WMF) in topic Schema proposal

Coder server behavior

edit

I spent some time fleshing out the coder server behavior. Generally, I'm thinking about this data model hierarchically. A wiki has campaigns; campaigns have worksets; and worksets contain revisions.

  • [wiki] > [campaign] > [workset] > [revision]

Users can request to be assigned a workset. A workset represents a small random sample from a whole campaign's larger sample. While a workset is active, those revisions are *owned* by user who claimed them until they abandon them or they expire.

See my notes on the REST interface below.

coder/
Lists out wikis with campaigns.
example response
{
    "wikis": ["enwiki", "ptwiki"]
}
coder/enwiki
Lists out active campaigns for English Wikipedia
example response
{
    "campaigns": ["Quality -- 10k sample 2014", "Edit type -- 10k sample 2014"]
}


coder/enwiki/?expand=true
Lists out active campaigns for English Wikipedia with metadata expanded.
example response
{
    "campaigns": {
        "Quality -- 10k sample 2014": {
            "form": "damaging_and_good-faith",
            "view": "diff_to_previous"
            "progress": {
                "completed": 150,
                "assigned": 275,
                "available": 750
            }
        },
        "Edit type -- 10k sample 2014": {
            "form": "damaging_and_good-faith",
            "view": "diff_to_previous"
            "progress": {
                "completed": 73,
                "assigned": 253,
                "available": 575
            }
        }
    ]
}


coder/enwiki/Quality_--_10k_sample_2014
Gathers metadata for a particular campaign
example response
{
    "Quality -- 10k sample 2014": {
        "form": "damaging_and_good-faith",
        "view": "diff_to_previous"
        "progress": {
            "completed": 150,
            "assigned": 275,
            "available": 750
        }
    }
}
coder/enwiki/Quality_--_10k_sample_2014?assign=workset
Requests that a new workset be assigned to the current user
coder/enwiki/Quality_--_10k_sample_2014/345
Gathers metadata for a particular workset
example response
{
    "workset": {
        "id": 345,
        "assignee": {
            'global_id': 467890,
            'username': "EpochFail"
        },
        "expiration": "2015-02-22T13:45:56Z",
        "revisions": [
            {"rev_id": 3456780},
            {"rev_id": 3456781},
            {"rev_id": 3456782},
            {"rev_id": 3456783},
            {"rev_id": 3456784},
            {"rev_id": 3456785},
            ...
        ]
    }
}
coder/enwiki/Quality_--_10k_sample_2014/345?submit=label&rev_id=3456780&label={...}
Submits a label (JSON blob) for a rev_id within a workset.
example response
{
    "success": true
}
coder/enwiki/Quality_--_10k_sample_2014/345?abandon=workset
Deletes a workset and frees the revisions to be labeled by others
example response
{
    "success": true
}
coder/login
Will forward to meta.wikimedia.org to attempt an OAuth handshake.
coder/logout
Logs the user out of the coding system
example response
{
    "success": true
}

That's all I've got for now. --Halfak (WMF) (talk) 22:31, 18 February 2015 (UTC)Reply

Shouldn't "coder/enwiki" and "coder/enwiki/?expand=true" return something in common? E.g. the simplified version is returning the name "quality_2014" which doesn't appear anywhere in the expanded version. Maybe both should return an object, and the expanded one should be an "extension" of the simplified one? Helder 13:27, 19 February 2015 (UTC)Reply
Yup. I've fixed it. --Halfak (WMF) (talk) 15:18, 19 February 2015 (UTC)Reply
When you say "Deletes a workset and frees the revisions to be labeled by others", do you mean that the workset will still exist but that other users will be allowed to claim it? If so, this doesn't look like a "deletion" for me. It is just an un-assignment... Helder 13:31, 19 February 2015 (UTC)Reply
Ahh. I see a workset as a "claim" on a some tasks. When that claim is deleted, the tasks are freed to be gathered in a new workset. Now that I think about it, this deviates from the strict hierarchy that I suggested above -- a campaign wouldn't contain worksets so much as worksets would be able to be generated from within a campaign on demand. Regardless, it seems that we're imagining the same thing. --Halfak (WMF) (talk) 15:18, 19 February 2015 (UTC)Reply

Proposal: Integration of revision coder service

edit

I've been thinking about where the "revcoder home" should actually live. I realized that in this mock, the revcoder stuff would push down the edits in Special:Contributions -- which would be frustrating and might convince editors to disable the gadget.

Instead, I thought that we might simply create a page in project space (e.g. en:Wikipedia:Revision_coder) and load up the list of active campaigns there. And it occurred to me that we could load up the revcoder in the same space as a single-page application. Here's a couple mockups that represent what I have in mind.

 
Integrated revcoder mock (before gadget install). A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page -- when no gadget is installed, a button is displayed to take the user to a set of instructions.
 
Integrated revcoder mock. A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page.

Having the revcoder operate as a single page application might require us to do a bit more work, but it will dramatically reduce the amount of requests that the revcoder gadget (and therefor the user's browser) will need to make to the revcoder service. It will also allow us to pre-load the next revision/diff to improve performance. --Halfak (WMF) (talk) 15:35, 19 February 2015 (UTC)Reply

I'd like to throw my two cents on the matter. After Halfak's explanation yesterday, I find his approach on the matter quite sound. I think there is great benefit in having a gadget which has several campaigns which is divided among tasks that expire if people are sitting on them. I think this approach suits our crowd-sourcing culture at Wikimedia projects better. After all crowd-sourcing itself is a divide-and-conquer strategy to begin with. -- とある白い猫 chi? 18:20, 21 February 2015 (UTC)Reply

Defining where the gadget will be executed

edit

I think there are (at least) these two options:

  1. When a page is loaded, the loader part of the gadget checks if it is associated to a given Wikidata item (to be created once we translate the page to pt or other language), and if it is, just load the rest of the gadget code
  2. When the page is loaded, check its HTML to see if it has an element with id="foo-bar", and load the rest of the gadget in case there is such an element in the page.

Helder 22:43, 10 March 2015 (UTC)Reply

I implemented option 2. Helder 20:10, 15 March 2015 (UTC)Reply

Schema proposal

edit

Hey folks,

I worked up a schema file to propose the structure of the revision coder database. Note that both "task.meta" and "label.data" are schemaless JSON fields. This will enable us to describe arbitrary task types (not just review of a revision) and arbitrary label data (not just two boolean fields). I've also included some sample queries that will be used within the API.

CREATE TABLE user (
  id INT,
  created TIMESTAMP,
  touched TIMESTAMP,
  PRIMARY KEY(id)
);
/*
INSERT INTO user (id, created, touched) VALUES (608542, NOW(), NOW());
*/

CREATE TABLE campaign (
  id SERIAL,
  name VARCHAR(255),
  form VARCHAR(255),
  view VARCHAR(255),
  created TIMESTAMP,
  PRIMARY KEY(id)
);
/*
-- Inserts a new campaign
INSERT INTO campaign (name, form, view, created)
VALUES ("Edit quality -- 2015 sample", NOW());

-- Gathers summary statistics and metadata for a campaign
SELECT
  campaign.name AS campaign_name,
  campaign.created AS campaign_created,
  COUNT(DISTINCT task.id) AS tasks,
  SUM(label.task_id IS NOT NONE) AS labels
FROM campaign
LEFT JOIN task ON task.campaign_id = campaign.id
LEFT JOIN label ON label.task_id = task.id
WHERE campaign.id = 345;
*/


CREATE TABLE task (
  id SERIAL,
  campaign_id INT,
  created TIMESTAMP,
  meta JSONB,
  PRIMARY KEY(id),
  KEY(campaign_id)
);
/*
-- Inserts a new task
INSERT INTO task (campaign_id, created, meta)
VALUES (345, NOW(), '{"rev_id": 506725001}');

-- Gets all tasks and labels for a particular campaign
SELECT
  task.id AS task_id,
  campaign_id,
  label.user_id AS label_user,
  label.timestamp AS label_timestamp,
  task.meta AS task_meta,
  label.data AS label_data
FROM task
LEFT JOIN label ON label.task_id = task.id
WHERE task.campaign_id = 345;
*/

CREATE TABLE label (
  task_id INT,
  user_id INT,
  timestamp TIMESTAMP,
  data JSONB,
  PRIMARY_KEY(task_id, user_id),
  KEY(user_id)
)
/*
-- Inserts a new label
INSERT INTO label (task_id, user_id, timestamp, data)
VALUES (12, 608542, NOW(), '{"damaging": false, "good-faith": true}');

-- Gathers the labels for a particular task
SELECT
  label.task_id,
  label.user_id,
  label.timestamp,
  label.data
FROM labelBridget Sundell
WHERE task_id = 12;
*/

CREATE TABLE workset (
  id SERIAL,
  user_id INT,
  created TIMESTAMP,
  expires TIMESTAMP,
  PRIMARY_KEY(id),
  KEY(user_id)
);
/*
-- Inserts a new workset (but doesn't assign tasks yet)
INSERT INTO workset (user_id, created, expires)
VALUES (608542, NOW(), NOW() + INTERVAL '1 DAY');

-- Gathers the task and label data for a workset
SELECT
  workset.id AS workset_id,
  task.id AS task_id,
  task.meta AS task_meta,
  label.data AS label_data
FROM workset
LEFT JOIN workset_task ON workset_task.workset_id = workset.id
LEFT JOIN task ON workset_task.task_id = task.id
LEFT JOIN label ON label.task_id = task.id
WHERE workset.id = 345
*/

CREATE TABLE workset_task (
  workset_id INT,
  task_id INT,
  KEY(workset_id, task_id),
  KEY(task_id)
);
/*
-- Assigns a task to a workset
INSERT INTO workset_task (workset_id, task_id)
VALUES (345, 12);
*/

--Halfak (WMF) (talk) 16:38, 21 March 2015 (UTC)Reply

@Halfak (WMF): is there a typo on line 82?
Looks good otherwise (as far as I can understand SQL). Helder 21:11, 24 March 2015 (UTC)Reply
Indeed that is a typo! Apparently I was in the middle of googling my sister (http://www.sundelleye.com/doctors/) when I was writing that code and lost track of where my cursor was! I'm experimenting with implementing this schema today. Still no word on access to shared postgres instance, but we can always set up a local instance in our VM if we need to. --Halfak (WMF) (talk) 21:22, 24 March 2015 (UTC)Reply
Return to "Wiki labels/2015" page.