Wikisource reader app

This page is currently a draft. More information pertaining to this may be available on the talk page.

Translation admins: Normally, drafts should not be marked for translation.

videoconferencing

2 February 2025, 12:00h UTC

About Wikisource reader app - community conversation hour

https://meet.google.com/khd-qvfy-nsr

Wikisource is an open and free digital library with an extensive online repository of copyright-free pre-published texts and media resources in various languages of the world. Functioning similarly to Wikipedia, Wikimedia Commons, and other open access knowledge Wikimedia projects, Wikisource strives to make a vast collection of public domain content accessible to the global community. Its diverse content encompasses books, scholarly articles, speeches, newspapers, periodicals, manuscripts etc. along with huge collections of audio recordings, and videos that are not subject to copyright restrictions and serves as a valuable resource for students, educators, researchers, and anyone seeking access to public domain materials.

The collaborative nature of Wikisource encourages active participation from the global community. Anyone with an interest in preserving and sharing public domain content can contribute to the site. Contributors can add new texts, meticulously proofread existing content, or engage in translating texts into various languages. This collective effort fosters a vibrant community of volunteers dedicated to enriching and maintaining the quality of Wikisource.

Requirement of mobile reading app

English Wikisource was visited more than 138 million times in the year 2024 out of which around 40% visited from mobile web platform. The percentage is around 50% for major Indian language Wikisource projects like Tamil, Bangla etc. So, even without the existence of any reading app, the mobile users are visiting the Wikisource projects on a significant proportion and that itself can provide a good opportunity to develop a reading app.

Selection of books for the app

Before selecting the books, we need to understand the workflow of Wikisource. That itself is a major problem to digest as different language versions follow different workflows for their work, where some steps overlap, while others do not.

So, to avoid this difficulty, a workflow for printed materials like books, periodicals, newspapers, etc is outlined in as comprehensive manner as possible and can be adopted by an ideal Wikisource language community, if they would like to. For convenience, workflow for audio and video contents, which are still not very well developed for Wikisource, are avoided.

The common workflow which almost all Wikisource language communities adopt for transcription of contents are as follows -

Identification - The first step is to identify the printed materials which are within the scope of Wikisource considering the copyright status, publication status etc.
Digitization - Scanning of these materials is the next step, which can be done fresh or can be collected from different digital and physical resources, where they are already digitised.
Upload - Once the digitised materials are available, the next step is to upload them on Wikimedia Commons.
Indexing - The uploaded materials are then transferred to Wikisource in Index namespaces and checked for missing or duplicate or bad scans etc to create indexed pagelists and table of contents.
Proofreading - Wikisource community volunteers then OCR and proofread each and every pages of these material.
Validation - A second group of volunteers check the proofread pages again and validate them.
Transclusion - After the proofreading and/or validation is done, the contents are then transcluded into main namespaces after properly dividing them into chapters etc. according to the table of contents. This step makes it ready for readers to read.

Note: This is the basic workflow of Wikisource which is expected to be followed by all communities. Unfortunately, some communities miss some of the critical steps like creation of tables of contents or the entire step of transclusion etc. due to different reasons.

Now, apart from the above-mentioned basic workflow to transcribe and create e-book, communities differ while adding with the metadata of the materials like, name of authors, publishers, publication years etc. Now there are three kinds of practical scenarios adopted by communities and combinations within these three.

No metadata - Volunteers sometimes do not add any kind of metadata anywhere or partially on Wikisource index pages or transcluded pages. That is the worst kind of scenario and needs to be avoided.
Metadata stored locally - Majority of language communities store metadata locally on Wikimedia Commons at the file description and/or Wikisource at the index namespaces in designated fields and/or in header of transcluded pages. This can lead to duplication of efforts, increased chance of error, data redundancy etc.
Metadata stored on Wikidata - A very few Wikisource language communities store metadata centrally as Wikidata items and roundtrip them back on Wikimedia Commons at the file description and/or Wikisource at the index namespaces in designated fields and/or in header of transcluded pages. This is an ideal scenario, which provides opportunities to fully leverage the power of Wikidata.

Now, for a Wikisource mobile reading app, both actual content and metadata are equally important, so that not only readers can read the content, but also can easily navigate and search them. Storing content at a central database like Wikidata is thus preferable to easily query and make use of the metadata.

Keeping the 1 to 7 steps of content transcription and Step 3 of metadata in mind, the suitable criteria to select a Wikisource content to be available to readers can be drafted.

The material needs to -

be digitised and uploaded on Wikimedia sites
have an index page
completely proofread (at least, if not validated)
completely transcluded with divisions of chapters, if any.
have metadata stored centrally and accurately following Wikidata Books data model with the following linkages on respective Wikidata items.
1. title in native language (mandatory)
2. language of work (mandatory)
3. author(s), editor(s) (if any), translator(s) (if any)
4. date of publication, publisher, place of publication
5. Wikisource index page url (mandatory)
6. Wikisource sitelink of transcluded page with proofread and validated badges. (mandatory)

Let’s get such a list for Bangla Wikisource with this SPARQL query - https://w.wiki/BN3z

SELECT DISTINCT ?sitelink ?itemLabel WHERE {
  ?sitelink schema:isPartOf <https://bn.wikisource.org/>; schema:about ?item; schema:name ?ws.
  { ?sitelink wikibase:badge wd:Q20748092. }
  UNION
  { ?sitelink wikibase:badge wd:Q20748093. }
  ?item wdt:P1957 [].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "bn". }
}
ORDER BY (?itemLabel)

The data will be fetched through Wikidata API.

Development

The planned components are

Books meta-data API
EPUB generator
The client mobile app

The API

An API was developed which serves a catalogue or index of books which follow the above described books data model. It currently contains works from English, French, and Bangla languages since they are already following the data model. Support for other languages can be added.

The API was built using Django and deployed on Toolforge. It periodically runs a set of SPARQL queries to retrieve data, process that data and update the DB.

Link: wsindex.toolforge.org
Repo: codeberg.org/ph4ni/wsindex

Data that can be fetched from the metadata API
Key	Description	Sourced from
wikidata_qid	QID of the book	Wikidata
title	Title in English	Wikidata label
title_native_language	Title in the native language	Wikidata label
languages	List of languages the book is in	Wikidata
date_of_publication	Date published	Wikidata
authors	List with Author label and Wikidata QID	Wikidata
editors	List with Editor label and Wikidata QID	Wikidata
translators	List with Translator label and Wikidata QID	Wikidata
genre	List of genres of the book	Wikidata
type_of_work	Form of creative work	Wikidata
ws_url	Link to the Wikisource page	Site link
thumbnail_url	Link to the thumbnail version of cover page	Commons
epub_url	Link to the epub file	ws-export
wikisource_index_url	Link to index page	Wikidata
view_count	Number of views of the ws_url page in last one year	Page views API
subjects	Subjects of the work	Wikidata

Querying the API
URL	description
https://wsindex.toolforge.org/books/	Base URL which returns 32 results with pagination
http://wsindex.toolforge.org/books/?page=2	Example of page
https://wsindex.toolforge.org/books/Q51543972/	Get book by the QID. Also works without the prefix 'Q'
https://wsindex.toolforge.org/books/?languages=fr	Get books by language code
https://wsindex.toolforge.org/books/?search=India	Search books by title and author names.

EPUB generator

EPUB files for the app are sourced using the ws-export tool.

Link: ws-export.wmcloud.org/
Repo: github.com/wikimedia/ws-export

Client app

Existing free/open-source book-reading apps were examined which could be used to build the reader app and the Myne Android app was forked to build the reader app for Wikisource.

Screen recording of the app under development

Features currently functional:

Viewing list of all books
Exploring books by language
Exploring books by genre
Search function to search by title or author
Downloading books to local device for offline access
Share links to the Wikisource page of each book
Manage local library - read, delete, completion %
Change font size, look up words, dictionary
Dark and Light mode
Localisation support

The app is built with Jetpack Compose and was modified to work with the books metadata API and to support content from Wikimedia projects.

Repo: github.com/cis-india/Wikisource-Reader

Frequently Asked Questions

Why is a specific book not available?
The Wikidata item of the work should have the statements described in the above data model and be marked with a proofread or validated badge at the sitelink. If all of this criteria is met, the app will list the work within 24hrs.

Why is a book download failing?
The app relies on the ws-export tool for downloads. Sometimes, the tool might be unavailable. The app attempts to download the book as soon as the ws-export tool is live.

Can I read a work which is not fully proofread or validated in the app?
Yes, but you have to manually download the epub file from the Wikisource website and add the downloaded file to the app from the Library section.

How frequently is the catalogue in the app updated
The catalogue is updated daily.