Grants:IEG/Public Domain Textbook Import
status: withdrawn
Individual Engagement Grants |
review grant submissions |
visit IdeaLab submissions |
eligibility and selection criteria |
project:
project contact:
taosubmarinesmail.ru
participants:
grantees:
User:Herper.gr
summary:
Working on https://en.wikibooks.org/w/index.php?title=Snakes_of_Europe, this makes me wonder how to work with a single scanned pdf page, how to import the OCR text from archive.org, extract plates, tables or diagrams as an image, and ensure approximate layout is the same on a heavily modified pdf/odt output (printed ebook)
2014 round 1
Project idea
editWhat is the problem you're trying to solve?
editThe book, The Snakes Of Europe is a 100 year old public domain text. There is no modern day comparable open access/source field-guide/reference/text-book, that has photos to assist species identification of the subject matter, and information. My idea is to enable easy cross-referencing of external scans (pdf/image/...) (ie from archive.org) and of external OCR sources, and to look at ways to improve import of OCR (https://en.wikibooks.org/wiki/Snakes_of_Europe/Definition_and_Classification has various tables incorporated that have not been pasted legibly, whereas https://en.wikibooks.org/wiki/Snakes_of_Europe/Habits is a fairly simple copy and paste, fully legible, even without correct formatting).
What is your solution?
editI would like to develop an extension for firefox and chrome to have external pages (a pdf page from archive.org) on a top side panel and be able to work with OCR text on a lower side panel, so as to be able to import (copy) a piece of the pdf with say an image or a table of interest. With this concept I would be able to paste selected tables to the 'Definition_and_Classification' article above, and have them appear as images. I could also copy in image plates, and taxobox type features, of individual species easily. In this extension i could also look at incorporating addtional utilities such as GOCR, to work in only this extension.
Project goals
edit- Have an easier template to work with external material.
- Be able to work with multiple revisions of the same material (https://en.wikibooks.org/wiki/Snakes_of_Europe/Index#THE_SNAKES_OF_EUROPE has the naming nomenclature used by Boulenger in 1913, https://en.wikibooks.org/wiki/Snakes_of_Europe/Species is current, copied from https://en.wikipedia.org/wiki/List_of_reptiles_of_Europe. A revision system would allow forking to produce a current up-to-date ebook, and a formatted OCR of the original material better than at archive.org.
Ready to create the rest of your proposal?
Use the button below just once to create the remaining sections you'll need!
Part 2: The Project Plan
editProject plan
editScope
editActivities
editBudget
editTotal amount requested
editBudget breakdown
editIntended impact
editTarget audience
editCommunity engagement
editFit with strategy
editSustainability
editMeasures of success
editParticipant(s)
editDiscussion
editCommunity Notification
editPlease paste a link below to where the relevant communities have been notified of this proposal, and to any other relevant community discussions. Need notification tips?
Endorsements
editDo you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.
- Community member: add your name and rationale here.