Research:Comparing most read and trending edits for Top Articles feature
This project's data is available for download and reuse.
The Android and iOS Wikipedia apps have an Explore feed that contains a card listing 5 trending articles. Currently this Trending card contains the top 5 most read articles within the past 24-48 hours. The mobile team is considering replacing this list with 5 articles based on trending edits.
The current study intends to evaluate whether potential Wikipedia app users prefer a feed based on recent pageviews or trending edits.
Overview
editThe Wikimedia apps team wants to make the content that appears on the Trending card as recent and relevant as possible. Currently, pageviews for articles are not available in real time: the pageview API can only provide top read articles for the previous day. As a result, the content of the Trending card is always a little bit out of date.
The Trending edit service provides a list of articles that are experiencing a significant uptick in edits, in nearly real time. This means that this feed is potentially a better source of recent and relevant content than the pageview API.
However, we don't know if trending edits represent a more interesting set of reading recommendations than top pageviews (even day-old pageviews). The articles that are being heavily edited at a given time are not necessarily the same ones that are being heavily read.
Furthermore, the audience for the Wikipedia app (global readers) is significantly broader than the English Wikipedia editor base, which tends to skew heavily towards North America and Western Europe.[citation needed] For example, many people in India read English Wikipedia. We do not know if the proportion of India-based editors matches the proportion of India-based readers. If there is a higher proportion of readers in India compared to editors, a pageview-based feed would tend to show more content that is of likely interest to India-based readers than a trending edits-based feed.
Other variables, such as time of day, may also make a difference on the perceived relevance of the two feed sources. The current, pageview-based feed is updated every day (at around 3:00 UTC) with the previous-day's top read articles. The list then becomes progressively more out-of-date over the course of the day—and hence, less potentially relevant to a reader interested in "trending" content. The trending edits feed is much more current, but also noisier—there are many reasons that an article could have received a sudden spike in editing activity, and not all of them are necessarily equally relevant to a Wikipedia reader.
Goals
editOverall, these two feeds are likely to display somewhat different sets of articles at any given time, and it is not clear which set of articles is, on average, more interesting to readers looking for recommendations of trending articles to read next.
- Considerations
- When considering what the 'optimum' set of top articles should be, the product team should consider the following general features of the list:
- whether people are familiar with the topics of the articles listed in the feed
- whether people consider these articles to be 'timely'
- the degree to which people are interested in learning more about the articles in the feed
- whether the previews of the articles listed in the feed contain contextual metadata (images, item descriptions) that make the list more visually engaging and invite curiosity
We address these considerations in the research questions below.
In addition, the product team should consider whether the articles lists generated by the 'top read' and 'trending' feeds are equally relevant and interesting to all readers, or whether the 'trending edits' approach to generating top article lists yields articles that are less familiar to readers from different geographies and cultural backgrounds than the English Wikipedia editing community. We address these considerations in the Hypotheses section below.
- Research questions
- In this study, we want to know:
- RQ1: overall, do people prefer lists that are based on recent page views, or trending edits?
- RQ2: which list contains more articles around topics that people are familiar with?
- RQ3: which lists tend to contain more articles related to topics that people have heard/read about elsewhere (for example, on a news website) in the past 24 hours?
- RQ4: are people more likely to consider using the 'top articles' feature after viewing articles in a 'top read'-based list vs. a 'trending edits'-based list?
- RQ5: how often do people read Wikipedia on a mobile device?
- Hypotheses
- Reader familiarity with Explore feed content, by country of origin. Because editing-based activity metrics reflect the interests of the editing community, which in which US and EU-based editors are over-represented with respect to their proportion of global readership, we hypothesize that:
- H1: India-based readers will be less familiar with content that appears in the 'trending' list than the 'top read' list,
- H2: India-based readers will be less likely to have heard about topics that appear in the 'trending' list than the 'top read' list through off-wiki information sources (news websites, social media, blogs, etc.) within the past 24 hours, and
- H3: India-based readers will be more interested in reading topics that appear in the 'top read' list than the 'trending' list.
Methods
editTimeline
edit- April 2017: run study on Amazon Mechanical Turk
- May 2017: analyze and report results (first round)
- July 2017: run second round
- August 2017: analyze and report results
Study design
editThis study involves asking paid crowdworkers ("turkers" from Amazon Mechanical Turk) to provide basic information about their mobile internet browsing habits, followed by a task that involves analyzing a prototype interface of the Trending card in the Explore feed and filling out a short questionnaire. The questionnaire was offered through Qualtrics. We showed these article lists to US-based turkers and India-based turkers, in roughly similar proportions, to measure geographically mediated differences (a rough approximation for cultural differences) related to item familiarity and interest.
We released rating tasks to turkers over the course of 15 days during May 2016 and 7 days in July 2017, using the current set of top read and trending articles at each point. We released rating tasks at different times of days, to correct for time differences between the US and India, and to vary the relative 'timeliness' of the items in the top-read lists.
- Article list examples
- Survey questions
- How often do you read Wikipedia articles on a smartphone or other mobile device?
- How many articles in this list are CLEARLY RELATED to topics that you are familiar with?
- How many articles in this list are CLEARLY RELATED to topics that you have seen or read about on other websites (not Wikipedia) within the past 24 hours?
- How many articles in this list would you be interested in reading right now?
- If there was a list of trending articles LIKE THE ONES IN THIS LIST on the home screen of a Wikipedia app for mobile devices, how often would you use it to look for new articles to read?
- Why would you (choice from question #5) use a list that contained articles like these to find new articles to read?
- Policy, Ethics and Human Subjects Research
- Data collection and analysis will be conducted in compliance with Wikimedia's data retention guidelines for survey research. The design of the study will be informed by the Guidelines for Academic Requesters[1] developed by members of the Mechanical Turk worker community.
Results
editRQ1
editdo people prefer lists that are based on recent page views, or trending edits?
On average, raters reported that they would be more interested in reading the articles in the 'top read' list than the 'trending' list. The results were consistent across groups, and (marginally) significant for India-based raters.
t-test: interest in reading (click to expand)
India and US
Interested in reading - Overall top read observations: 83 top read average: 2.1686746988 top read std: 1.17010222093 trending observations: 92 trending average: 1.90217391304 trending std: 1.1800864247 t-statistic = 1.489 pvalue = 0.1383
US only
Interested in reading - USA top read observations: 46 top read average: 1.86956521739 top read std: 1.01314610415 trending observations: 44 trending average: 1.72727272727 trending std: 1.09469041625 t-statistic = 0.633 pvalue = 0.5283
India only
Interested in reading - India top read observations: 37 top read average: 2.54054054054 top read std: 1.24324324324 trending observations: 48 trending average: 2.0625 trending std: 1.23163593782 t-statistic = 1.746 pvalue = 0.0845
RQ2
editwhich list contains more articles around topics that people are familiar with?
On average, all raters were more familiar with the articles displayed in the 'top read' condition than articles displayed in the 'trending edits' condition, although the result for the US only group (n=90) was only marginally significant (p = 0.0543)
t-test: familiarity - general (click to expand)
India and US
Familarity - Overall top read observations: 83 top read average: 2.4578313253 top read std: 1.34702890211 trending observations: 92 trending average: 1.71739130435 trending std: 1.01431157803 t-statistic = 4.108 pvalue = 0.0001
US only
Familarity - USA top read observations: 46 top read average: 2.04347826087 top read std: 1.10249759418 trending observations: 44 trending average: 1.59090909091 trending std: 1.07276579284 t-statistic = 1.950 pvalue = 0.0543
India only
Familarity - India top read observations: 37 top read average: 2.97297297297 top read std: 1.44234206099 trending observations: 48 trending average: 1.83333333333 trending std: 0.942809041582 t-statistic = 4.339 pvalue = 0.0000
RQ3
editwhich lists tend to contain more articles related to topics that people have heard/read about elsewhere (for example, on a news website) in the past 24 hours?
On average, Indian raters found that more of the articles in the in the 'top read' condition were related to topics they had heard about recently through other media. US raters were not significantly familiar with the topics of 'top read' articles, though the means were different enough to yield an overall significant result.
t-test: familiarity - from recent news (click to expand)
India and US
Familarity from news - Overall top read observations: 83 top read average: 1.92771084337 top read std: 1.24941559862 trending observations: 92 trending average: 1.48913043478 trending std: 0.972385739542 t-statistic = 2.589 pvalue = 0.0104
US only
Familarity from news - USA top read observations: 46 top read average: 1.84782608696 top read std: 1.28481755268 trending observations: 44 trending average: 1.47727272727 trending std: 1.0550449444 t-statistic = 1.475 pvalue = 0.1438
India only
Familarity from news - India top read observations: 37 top read average: 2.02702702703 top read std: 1.19653749304 trending observations: 48 trending average: 1.5 trending std: 0.889756521003 t-statistic = 2.301 pvalue = 0.0239
RQ4
editare people more likely to consider using the 'top articles' feature after viewing articles in a 'top read'-based list vs. a 'trending edits'-based list?
Differences between US and India-based raters are significant, with India-based raters reporting that they would be more likely to use the 'top articles' feature more frequently than US-based raters (χ2=5.99, p=0.047).
in | us | |
frequently | 35 | 23 |
occasionally | 36 | 46 |
never | 6 | 13 |
RQ5
edithow often do you read Wikipedia articles on a smartphone or other mobile device?
Differences between US and India-based raters are significant, with India-based raters reporting that they use Wikipedia on a mobile device much more frequently than US-based raters (χ2=9.49, p=0.04).
in | us | row_total | |
At least once a day | 29 | 13 | 42 |
At least once a month | 9 | 15 | 24 |
At least once a week | 36 | 45 | 81 |
Less than once a month | 6 | 10 | 16 |
I never read Wikipedia articles on a mobile device | 5 | 7 | 12 |
col_total | 85 | 90 | 175 |
Hypotheses
edit- H1
- India-based readers will be less familiar with content that appears in the 'trending' list than the 'top read' list.
Supported. India-based raters were familiar with significantly more of the topics featured on the 'top read' list than the 'trending' list.
- H2
- India-based readers will be less likely to have heard about topics that appear in the 'trending' list than the 'top read' list through off-wiki information sources (news websites, social media, blogs, etc.) within the past 24 hours.
Supported. India-based raters had heard about significantly more of the topics featured on the 'top read' list than the 'trending' list from other news sources within the past 24 hours.
- H3
- India-based readers will be more interested in reading topics that appear in the 'top read' list than the 'trending' list.
Partially supported. India-based raters were marginally significantly more interested in the topics featured on the 'top read' list than the 'trending' list.
Discussion
editLimitations
edit- Mechanical Turk workers' expressions of interest or personal preference may not necessarily reflect the interests or preferences of Wikipedia readers generally, or mobile app readers specifically. This limitation pertains specifically to RQ1, RQ4, and H3. Study question #1 (RQ5)—"How often do you read Wikipedia articles on a smartphone or other mobile device?"— is intended partially as a 'sanity check' for this source of error; at very least, we know that over two thirds of respondents are regular readers (daily or weekly) of Wikipedia on mobile devices.
- The context of providing a rating/evaluation (for pay) is different than the context of browsing an app in your free time. What you say you like may be different from what you actually click on when no one is watching you or asking you questions. This limitation pertains specifically to RQ1, RQ4, and H3.
Conclusion
edit- Overall, there was a strong preference for 'top read' based recommendations, and all raters were also more likely to be familiar with the topics of the articles presented in those lists. Both of these differences were more pronounced for India-based raters than for US-based raters.
- Furthermore, switching to a 'trending edits'-based feed is likely to have a more pronounced negative impact on some app users than others. Specifically, Indian readers will find the articles in the new feed even less relevant and 'timely' than North American readers. There will be less stuff they know about in the feed.
- Will that mean that India-based readers (or readers from other countries outside the US/Europe) will be less engaged by the feature or will consciously perceive it to be biased towards 'Western' topics? Will they be less interested in these topics? An A/B test could provide more evidence, though possibly at the cost of making people who are exposed to the 'trending edits' condition less inclined to trust the feature in the future [2].
- Based on these findings, usage of the 'Top articles' feature is likely to drop if the app team switches to a feed based on the current trending edits algorithm instead of the 'top read' algorithm. This is despite the fact that the 'top read' data is always slightly out of date, whereas the 'trending edits' data is nearly live.
- If the goal of the feed is to engage all readers, irrespective of their culture or country of origin, then the team should consider that the proposed change would likely implicitly prioritize the interests of one group over another.
Next steps
editOn possible way forward is to building a 'mixed model' that includes measures of both pageviews and editing velocity in the final ranking. Depending on how these different metrics are weighted in the model, this approach may be able to strike a nice balance between general appeal, timeliness, local relevance, and serendipity [3].
References
edit- ↑ "Guidelines for Academic Requesters - WeAreDynamo Wiki". wiki.wearedynamo.org. Retrieved 2017-04-27.
- ↑ McNee, S. M., Kapoor, N., & Konstan, J. A. (2006, November). Don't look stupid: avoiding pitfalls when recommending research papers. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 171-180). ACM. (PDF)
- ↑ McNee, S. M., Riedl, J., & Konstan, J. A. (2006, April). Making recommendations better: an analytic model for human-recommender interaction. In CHI'06 extended abstracts on Human factors in computing systems (pp. 1103-1108). ACM. (https://doi.org/10.1145/1125451.1125660)