Wikimedia Digitization User Group

This page aims to collect all the available information regarding digitization projects within the Wikimedia movement, the tools that they use, hardware and software needed, best practices, learning patterns, among others. Its creation was decided in the first meeting of the Wikimedia Digitization User Group.

The purpose of this page it is not to provide deep-technical information, it is to allow non-experts to understand the basics of digitization, how to do it and which type of decisions they need to make if they plan to set forward a digitization program. If by any chance someone wants to go into the deep technical details, they can consult the Wikipedia pages over each of the concepts or read the bilbiography suggested.

It is also important to notice that there are multiple ways in which you can run a digitization program: you can either do it yourself, inside your institution, making all the decisions, or you can either outsource it to a company or partner with another institution that has the right equipments and will either do the job for you (this is, for example, the model adopted at the Boston Public Library) or either charge you for it (like the Internet Archive) or either provide you with all the equipments and make the crucial decisions for you (again, like Internet Archive, or like several Wikimedia chapters that are carrying digitization projects).

What and how you decide to do your digitization program is entirely up to you, your institution or community, and your own policies. If you decide to partner with another institution you probably won't need any of this information. If you decide to set your own digitization program, you will find most of this information of use in some way or another. Much of this information already exists and is scattered around the web and in Wikipedia pages. This page is just an effort to systematize the information needed for digitization.

Capture

edit

Two-dimension objects

edit

This section is advocated to determine the main factors that you need to consider whenever you are making a digital image of maps, books, photographs, certain artworks (two dimension artworks), negatives, microfiche or microfilm. These are general principles that apply no matter the format, size, state of preservation, etc., of the analog material. For specific considerations for each material, we have outlined the main considerations that you need to take into account to be able to do a good job, but the general principles still apply.

Audio

edit

Audiovisual material and moving images

edit

Planning a digitization project

edit

Planning a digitization project

  • Selection and preparation of materials
  • Provision of access to digital files
  • Long-term sustainability

One day scan-a-thon

edit

Going to a local institution one day and scanning their material. Things to consider.

Short-term scan-a-thon

edit

When going to distant places, things that you need to consider.

Long term digitization project

edit

Processing

edit

Information extraction

edit

If scans are uploaded to Commons and set up with Index pages on Wikisource, the Wikisource:Google OCR tool can be used to extract text from images.

Sharing and availability

edit

Wikimedia Digitization Projects

edit

Grants for digitization

edit

Here's a list of available grants with a little explanation of each of them.

  • Arcadia
  • Endangered Archives Program

Glossaries

edit

Additional Resources

edit

Here are the resources organized by categories.

Comprehensive resources for the whole digitization process

edit

Funny resources

edit

Technical guidelines

edit

ISO Standards

edit
  • ISO 12233/FDIS, ISO/TC42. 1999. Photography-Electronic Still Picture Cameras-Resolution Measurements.
  • ISO 12234-2/DIS, ISO/TC42. November 1998. Photography-Electronic Still Picture Imaging-Removable Memory-Part 2: Image Data Format-TIFF/EPS.
  • ISO 14524/FDIS, ISO/TC42. 1999 (January). Photography-Electronic Still Picture Cameras-Methods for Measuring Opto-Electronic Conversion Functions (OECFs).
  • ISO 15739/CD, ISO/TC42. 1999 (June). Photography-Electronic Still Picture Cameras-Noise Measurements.
  • ISO 16067-1. 2003. Photography – Spatial resolution measurements of electronic scanners for photographic images – Part 1: Scanners for reflective media
  • ISO 16067-2. 2004. Photography – Electronic scanners for photographic images – Spatial resolution measurements – Part 2: Film scanners
  • ISO 17321?
  • ISO 18937. 2014. Imaging materials – Photographic reflection prints – Methods for measuring indoor light stability
  • ISO 18937-4. (In progress). Imaging materials – Photographic reflection prints – Methods for measuring indoor light stability – Part 4: LED Illumination.

Selection of materials for digitization

edit

Understanding the science behind imaging technologies

edit