Grants:IEG/Easy Micro Contributions for Wiki Source

statusnot selected
Easy Micro Contributions for Wiki Source
summaryMake the process of typing books for wiki source very easy.
targetTamil Wiki Source
strategic priorityincrease participation
themetools
amount11700 USD
granteeTshrinivasan
contact• tshrinivasan@gmail.com
this project needs...
volunteer
join
endorse
created on18:52, 29 September 2014 (UTC)

Project idea

edit

What is the problem you're trying to solve?

edit

There are many old books scanned for preserving. We can type them all and publish in wiki source. But as there are many, many scanned books, it may take years to type with current systems. We dont have even good OCR for most of the languages.

What is your solution?

edit
  1. Build an application to split the scanned pages into small chunks with single words. i.e, one small image per word.
  2. Store all the images with proper name/numbering.
  3. Create a web application and show the words one by one, for users to type easily. Users should type one word at a time.
  4. Make the web application mobile friendly, so that users can type from mobile too.
  5. Users should be scored based on their contributions.
  6. Save the text as the users type them.
  7. Show all the text as the users types in a seperate page.
  8. Once all the single images are typed, publish the entire text as a page, so that users can copy and publish in wiki source.
  9. Tom improve quality, We can show the same words to two users. Get input and compare. Reshare the images until, we get most accurate typing.

Project goals

edit

The goal is to encourage more users to contribute and to get more books in wiki source. When the contribution is very easy and simple, more users contribute.

Project plan

edit

Activities

edit
  1. Build an application to split scanned images into small images per word
  2. Build web app to show the images and get them typed
  3. Publish the pages after all the images in a page are typed completely.

Budget

edit

Project Manager, Web app developer , Tester

Cost : 10 hrs /week , 15 USD / hr, 3 resources, 6 month (26 week)

Total : 11700 USD

Hosting

edit

The web application will be developed in Python and Django web framework. This is stand alone application and does not need integration with MediaWiki's core.


We need a dedicated server to run the web application. Initially, we can host the application in our VPS, for development and to get initial contributions.

Then, we can request for Wikimedia Labs for scalable hosting and continious support.


Scope

edit

The web application will be developed for desktop users to contribute easily. REST API support and mobile theme will be added in the web application. Mobile users can access the site with mobile browsers and contribute. If required, independant mobile applications can be developed in future.

Community engagement

edit

We will ask the wiki source contributors to give inputs on the project, user interface, mobile user interface design, giving scores,

We ask them to test the application every week, so that we can correct the issues in the early stage itself.


Sustainability

edit

Inintially the system is developed for Tamil Wiki Source. After the project completion, it can be used for any language. Even it can be used to generate CAPTCHA systems with non english charecters.

It can be used to integrate with existing OCR systems to train OCR and to compare the OCRed charecters.

Measures of success

edit

Contributions to wiki source increase by 30-40% as the new system makes the contribution very simpler. 3-4 completed books in next 6 months after the project is completed and released.

Get involved

edit

Participants

edit

Tshrinivasan - I am involving in publishing ebooks with creative commons license in Tamil. Released 110 ebooks so far at http://FreeTamilEbooks.com

I am python programmer for years and created a mediawiki uploder in python for bulk uploading images for commons. https://code.google.com/p/mediawiki-uploader/

Community notification

edit

Notified in Tamil Wiki Village Pump.

Links for the discussions.

Endorsements

edit

Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  1. Probably the first of its kind in mobile platform .
    Declaring CoI : I know Shrinivasan personally Commons sibi (talk) 06:37, 30 September 2014 (UTC)
  2. It is an important tool.--சஞ்சீவி சிவகுமார் (talk) 10:38, 30 September 2014 (UTC)
  3. Very innovative solution--Sodabottle (talk) 09:50, 4 October 2014 (UTC)
  4. This is good idea. This idea can be used to train the Tesseract OCR as well. balavignesh (talk) 18:45, 4 October 2014 (UTC)
  5. To preserve, ancient Tamil literature, this unique tool has to be created---- உழவன் +உரை.. 07:54, 5 October 2014 (UTC)
  6. Its is a very important initiative to preserve the books,manuscripts in tamil language. Still there are many books which canbe found Google NGRAMS which are based on year way back to 1890s those books can be converted using this tool. This will help to preserve the language heritage and its culture. Seesiva (talk) 06:26, 24 October 2014 (UTC)
  • This would help in growing the language and will help the original manuscript in electronic form in large number of people. This will foster further reasearch and help to retain the history and tradition of a age old language. Seesiva (talk) 06:27, 24 October 2014 (UTC)
  •   Support A good idea to tailor Captcha to the Indian language context. Much needed to reach to the millions who are getting onto the internet via the mobile and bypassing the desktop/laptop in India. I would like to personally track the progress of this project if it comes through. Best wishes! --Visdaviva (talk) 12:49, 27 October 2014 (UTC)