Grants:IEG/Web Application to automate frequent tasks for Tamil Wikisource and Tamil Wiktionary
Project idea
editWhat is the problem you're trying to solve?
editRecently developed a tool to integrate Google OCR and Indic wiki sources. https://github.com/tshrinivasan/OCR4wikisource based on the request here https://phabricator.wikimedia.org/T120788 Using this indic wiki source projects added around 4,00,000 pages.
As these pages are from OCR, the text requires repeated replacements, cleanups, markup additions etc.
In Wiktionary and WikiSource projects for Tamil Language, we have to make these changes in multiple pages or all pages for given category. Windows users are using Auto Wiki Browser tool for this purpose. GNU/Linux users have pywikibot based commandline tools. But many beginners feel tough to work with command line tools. Pywikibot has a steel learning curve to do automation.
What is your solution?
editI am planning to write a web application in Python Language to help the Tamil WikiSource and Tamil Wiktionary Communities.
As it is a web application, any user of Windows, Mac OSX, GNU/Linux or any OS can use it easily.
Project goals
editTo provide a tool to automate the repeated edit tasks, so that human can work on real manual works.
The following are the frequent requests from Tamil WikiSource and Tamil Wiktionary communities. Currently most of the tasks are done manually. They take huge amount of man hours to do these tasks.
- Add text on top or bottom of the pages
- Find and replace the set of words provided as csv files
- Create wiktionary pages with CSV files
- Change proofread quality status for specific set of pages
- Find and report broken links on pages
- Add message or template for lonely pages, short pages.
- Auto welcome new users
- Daily/Weekly/Monthly reports on contributors stats
- Users stats for the efforts on any special events like edit-athon
- Send bulk message/ notify users on any events or announcements.
Pages can be given individually or taken from given categories or from URL patten matching.
The tool will give web interface to design their solutions themself to automate the repeated tasks.
This application can be hosted on tools labs so that anyone can access it online, easily.
Project plan
editActivities
editPhase 1: Build this web application to do frequent wiki edit tasks.
Add the following features.
- Add text on top or bottom of the pages
- Find and replace the set of words provided as csv files
- Change proofread quality status for specific set of pages
- Create wiktionary pages with CSV files
Test and Deploy.
Phase 2:
Add the following features.
- Find and report broken links on pages
- Add message or template for lonely pages, short pages.
- Auto welcome new users
Test and Deploy.
Phase 3:
Add the following features.
- Daily/Weekly/Monthly reports on contributors stats
- Users stats for the efforts on any special events like edit-athon
- Send bulk message/ notify users on any events or announcements.
Test and Deploy.
Budget
edit- Project Manager : 600 USD/Month
- Developer : 400 USD/Month
- Tester : 400 USD/Month
- Packager : 300 USD/Month
- Total : 1700 USD/Month
Project will be completed in 5 months.
Total budget is 1700 x 5 = 8500 USD
Community engagement
editWill get inputs from Tamil wiktionary, wikisource community for the requirements, UI design and usage patterns before developing the application.
Will engage them as community testers to provide feedback. Improvements will be communicated to them via villagepumps every week. Based on the feedback, features/bugs fixes will be added.
Sustainability
editThe tool will be available for community to develop further. We will be providing further support and maintain for one more year. Will try to bring more developers by training new contributors.
Measures of success
editIncrease the automation tool users to 50% more on Tamil Wiktionary and wikisource communities as most of contributors do repeated changes manually.
Get involved
editParticipants
edittshrinivasan - A python developer for 3 years, developer of OCR4WikiSource, a tool to integrate google OCR and indic wikisource projects. https://github.com/tshrinivasan/OCR4wikisource
Community notification
editPlease paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
Endorsements
editDo you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).