Proposed solution: Provide a mechanism (Gadget or via Wikimedia OCR) that will allow users to demarcate columns and specific areas that need to be OCRed together, and store these areas against the Index page so they don't need to be repeated for every page.