Wikimaps proposal for maps georeferencing architecture in Wikimedia projects
Specification for a data model to store georectification data
editIn a nutshell
A proposal was made to create a Wikidata property for maps georeferencing data. Due to inactivity, the proposal was turned down. This page is dedicated to creating a unanimous proposal to bring forward for Wikidata and the Wikimedia ecosystem.
Old proposal at Wikidata: https://www.wikidata.org/wiki/Wikidata:Property_proposal/georeferencing_data
Please collect all related and relevant aspects of the proposal and the problem that it is supposed to solve on this page. Let's work on this page like with a Wikipedia article, editing the text based on references to discussion on the talk page, in the old proposal or in the Wikimaps Telegram group.
Background
editgeoreferencing (georectification, warping) is a type of coordinate transformation, a process that aligns scanned maps with a spatial reference system, allowing the map image to be displayed as a tiled web map. The georeferencing is done by finding pairs of ground control points (gcp's) on the scanned (raster) map and coordinate points in a digital map or aerial image (that is already georeferenced). With this information, the georeferencing algorithm distorts (warps, rectifies) the raster map to match the spatial reference system's geometry.
Wikimaps Warper is a georeferencing app that georeferences old maps. It was adapted for the Wikimedia environment based on MapWarper, originally created for the New York Public Library.
Other software that do similar operations are Klokan technologies' Georeferencer, used by the British Library maps, and Allmaps. Desktop GIS software such as QGIS also has georeferencing capability.
What is this proposal about?
editWikimaps Warper stores data in its own database, and this data, as well as the data produced by other georeferencing tools, could be available for developers of more lightweight tools, if stored in the Wikimedia projects. However, the community has not yet reached a consensus about the format and scope of the data.
Proposed features
editDescribe the proposed features. Record the arguments, counter-arguments and options. Refer to the discussion elsewhere, do not debate on the page.
Create a combined Wikidata property and a single dataset instead of 3 separate properties and datasets
editDescription
editThe old proposal suggested to create 3 Wikidata properties to be used to hold all georeferencing data. The values would be Wikimedia Commons tabular data ie. data table files, that are JSON files under the hood.
- Dataset for the georeferencing control point data. This would include control point pairs for the scanned map and the coordinate system.
- Dataset for the georeferencing mask geoshape. This would include information about the bounding box coordinate points of the coverage of scanned map in the coordinate system.
- Dataset for the georeferencing pixel mask data. This would include the coordinates of points on the raster map image, representing a mask that covers the map sheet beyond the map image.
Argumentation & open questions
editSome of the following concerns should be moved to the discussion about the GeoJSON format after discussion about creating a single Wikidata property referencing a GeoJSON has been agreed on.
🌟 Multichill proposes instead a combined map data (GeoJSON) file on Wikimedia Commons with distinct features for gcp, mask, and pixelmask.[1] His proposal and example.
🌟 Bert Spaan notes that GeoJSON only supports WGS 84, while the GDAL transformation can use any coordinate reference system (projection).[2]
💬 TuukkaH suggests that we still can support additional coordinate systems if we specify so in our spec: However, where all involved parties have a prior arrangement, alternative coordinate reference systems can be used without risk of data being misinterpreted (GeoJSON spec).[3]
🌟 Bert Spaan further notes that A GeoJSON polygon may potentially contain holes, while a georeferencing mask does not contain holes and georeferencing software does not support holes [- typically?].[2]
🌟 Would the single file approach cover the case when a scanned map can consist of multiple maps (e.g. map sheets, inset maps)?
💬 Bert: This could be done by allowing multiple georectified maps in a single JSON maps
object (refers to the original JSON Schema proposal).[2]
💬 Jheald notes that qualifiers may be needed to express complex cases, eg. with multiple sets of data. [1]
💬 Multichill counter-proposes that any single FeatureCollection can only use a single image mask and set of control points.[4]
💬 TuukkaH proposes solutions to cater for multiple maps per image:
- We can crop and split the original image into multiple source images.
- We can link multiple GeoJSON files to a single source image.
- We can allow multiple FeatureCollections in one GeoJSON file (with a top-level FeatureCollection wrapping the others).
💬 In IIIF Georeference Extension the issue is resolved so that a resource can be georeferenced by using multiple Georeference Annotations, each with their own SVG Selector and GCPs.[5]
💬 Susannaanas notes that information about the chosen principle is needed for the constraints in the Wikidata property.
Proposed conclusion
editAdd here the updated Wikidata property proposal, or the information for creating it. All properties are debatable until agreed on.
Name of the property | Georeferencing data |
---|---|
Description | Format for rectifying images, specifically maps. The format is backwards compatible with GeoJSON. |
Represents | |
Data type | geo-shape |
Domain | Commons image |
Allowed values | Data:.*\.map |
Example 1 | Data:Georectification.example2.geojson.map Multichill's original proposal |
Example 2 | |
Example 3 | |
Source | Wikimaps Warper, external tools and sites such as NYPL MapWarper, British Library Georeferencer, David Rumsey maps, other sites with georeferencing. User input and upload manually and through batch upload and edit tools. |
Planned use | Transferring data from Wikimaps Warper and external tools to Wikimedia projects. Make the data available for microservices on Wikimedia projects. |
See also | Wikidata:Property_proposal/external_georeferencer_URL |
Terminology
editProposals of using specific language in the Wikimedia proposal
Agreed | Alternatives |
---|---|
georeferencing, georectifying, georectification, warping, georegistration | |
imagemask, mask, crop, cutline | |
Relation to Allmaps and IIIF
edit🌟 Jheald suggests it would be nice if we had a WMF-maintained IIIF service that supported tiles — not sure whether the current Commons offering supports tiles or not, & whether Allmaps can be made to work with a map from Commons.
💬 Susannaanas notes that WMF plans to support a IIIF service are not continued. Abbe98 notes that Allmaps approach of rendering/warping clientside is more important to wikimedia use than its usage of IIIF.
Additional metadata not included in the scope of this proposal
edit🌟 Jheald makes a note that
- The georeferencing apps may hold side-data we might want to store alongside with the GCPs — eg additional things that MapWarper stores.
- The pixel dimension of a Wikimedia Commons image may change, and then related image coordinates may change as well. For this reason it would be useful to make sure that GCPs relate to the same revision of a file (or at least one with the same dimensions) as the version being served.
🌟 TuukkaH notes a concern with GeoJson: where to put extra metadata. He proposes to simply extend the (top-level) FeatureCollection with these metadata fields as GeoJSON "foreign members"?
Proposed GeoJSON schema
editDescription
edit🌟 Bert Spaan notes that this schema could made using JSON Schema.
🌟 Original proposal by the team at Wikimania 2019 hackathon.[6]
🌟 Initial text for the Specification drafted by The DJ:
A GeoRectify GeoJSON is format for image, specifically map rectifying, information that is backwards compatible with GeoJSON. As such the geographic parts of it can be displayed with any GeoJSON tool. A GeoRectify GeoJSON is specified as:
- At least one FeatureCollection
- This FeatureCollection has at least one Polygon feature and 0 or more Point features.
- The Polygon feature describes the image that has been rectified.
- The properties of the Polygon feature must contain a file attribute and a transformation attribute. It MAY contain the attributes: sha1, commons_entity
- type: ImageMask (do we need this?)
- file: simple filename of the image file belonging to this georectification ? or should this be a url ?
- transformation: We need to get clear why we need this info and what it means exactly
- sha1: sha1sum of the file. Use this to make sure that the file for which this georectify information was authored is exactly the same as one used later on. If later sha1 becomes so broken that another format is needed, add "sha256" as attribute.
- commons_entity: Commons data media entity
- unit: pixel, The default unit is in pixels of the image, before rectification. alternative units are currently not supported.
- There is an array attribute name "cutline". This array has, for each coordinate in the polygon geometry, a corresponding point (array of 2 numbers) in the image. This defines which parts of the image are cropped from the image ??????
- The properties of the Polygon feature must contain a file attribute and a transformation attribute. It MAY contain the attributes: sha1, commons_entity
- The Points describe the Geo Control points of the rectification.
- Each point has a geometry coordinate. For each point there is a corresponding point in the images, specified in an attribute named "controlpoint" which is an array of 2 numbers.
🌟 TheDJ suggests adding 'sha1' or 'sha256' field to pixelmask, to identify and track the exact version of an image that the mapping was made from.[1]
🌟 Multichill proposes (and TheDJ agrees) that the mask and the pixel mask can be merged into a single Feature in the GeoJSON format.
🌟 How to deal with image proportions when making control points and mask points to the raster map? Use pixels or relative values to document extents?
Proposed conclusion
editPrepare the proposed schema as JSON Schema! Something like this, but with up-to-date content. The initial schema is created with ChatGPT from Multichill's example. :
{ "$schema": "http://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "license": { "type": "string" }, "description": { "type": "object", "properties": { "en": { "type": "string" } }, "required": ["en"] }, "sources": { "type": "string" }, "zoom": { "type": "integer" }, "latitude": { "type": "number" }, "longitude": { "type": "number" }, "data": { "type": "object", "properties": { "type": { "type": "string", "enum": ["FeatureCollection"] }, "features": { "type": "array", "items": { "type": "object", "properties": { "type": { "type": "string", "enum": ["Feature"] }, "properties": { "type": "object", "properties": { "type": { "type": "string", "enum": ["Imagemask"] }, "file": { "type": "string" }, "commons_entity": { "type": "string" }, "sha1": { "type": "string" }, "sha256": { "type": ["string", "null"] }, "transformation": { "type": "object", "properties": { "type": { "type": "string", "enum": ["affine"] } }, "required": ["type"] } }, "required": ["type", "file", "commons_entity", "sha1", "transformation"] }, "geometry": { "type": "object", "properties": { "type": { "type": "string", "enum": ["Polygon", "Point"] }, "coordinates": { "type": "array", "items": { "type": "array", "items": { "type": "array", "items": { "type": "number" }, "minItems": 2, "maxItems": 2 } } } }, "required": ["type", "coordinates"] }, "cutline": { "type": ["array", "null"], "items": { "type": "array", "items": { "type": "number" }, "minItems": 2, "maxItems": 2 } }, "controlpoint": { "type": ["array", "null"], "items": { "type": "integer" }, "minItems": 2, "maxItems": 2 } }, "required": ["type", "geometry"] } } }, "required": ["type", "features"] } }, "required": ["license", "description", "sources", "zoom", "latitude", "longitude", "data"] }
Related work
editAdd a section that describes a related project or workflow in a nutshell. Explain why it is important for this proposal to learn about it. Use template Wiki Loves Living Heritage/Box to add the information.
Maps! Maps! Maps!
This Wikimania Hackathon 2019 proposal by Bert Spaan & team created a proposal for a metadata standard for maps georectification.
See also
- Presentation on YouTube
- Phabricator ticket for the hackathon project phab:T227036
- The presentation in ObservableHQ.
Draft specification for JSON data model for map georectification
Maps! Maps! Maps! continued:
There is need for a single way to describe georectified (historical) maps.
This data model should specify the following properties:
- the mapping between pixels on the scanned map and geospatial coordinates,
- the masking/clipping polygon to remove non-cartographic material,
- the source (or multiple sources) of the map image,
- the transformation type
Web service and Docker container
Maps! Maps! Maps! continued:
Web service and Docker container to compute GeoJSON masks and GeoTIFFs from images with georectify-json-spec
Allmaps
Bert Spaan has created a suite of browser-based online tools for manipulating maps. The suite is built around the use of IIIF. There are no current plans to make available IIIF on Wikimedia projects.
Georeference Annotations
Allmaps and IIIF have specified a format for creating web annotations that include the data necessary for georeferencing.
Status & version
editWorking document / Draft / Under discussion / Proposed to...
Invite to comment and contribute
edit- Bert Spaan
- Chippyy
Contributors
edit- Susanna Ånäs (Susannaanas) 🦜 10:38, 31 August 2024 (UTC)
- Yug (talk) 14:01, 31 August 2024 (UTC)
- TuukkaH (talk) 22:27, 1 September 2024 (UTC)
Watching
edit- Susanna Ånäs (Susannaanas) 🦜 10:38, 31 August 2024 (UTC)
- Yug (talk) 14:00, 31 August 2024 (UTC)
- TuukkaH (talk) 22:27, 1 September 2024 (UTC)
References
edit- ↑ a b c "Wikidata:Property proposal/georeferencing data - Wikidata". www.wikidata.org. Retrieved 2024-09-01.
- ↑ a b c Spaan, Bert (2019-07-01). "Proposal for Wikimania 2019 Hackathon". Observable. Retrieved 2024-09-01.
- ↑ Butler, H.; Daly, M.; Doyle, A.; Gillies, Sean; Schaub, T.; Hagen, Stefan (2016-08-01). "The GeoJSON Format".
- ↑ "User talk:Multichill/Map warping format - Wikimedia Commons". commons.wikimedia.org. Retrieved 2024-09-01.
- ↑ "Georeference Extension". iiif.io. Retrieved 2024-09-02.
- ↑ bertspaan/georectify-json-spec, 2019-11-19, retrieved 2024-09-02