Glottography Data Tutorials

From a language map image to a Glottography dataset in six tutorials

Overview

Georeferencing
Digitising
Attributes & Metadata
Glottocodes
Data Curation
Error Correction

View the Project on GitHub Glottography/tutorials

Georeferencing a language map

This tutorial walks you through the process of georeferencing a language map provided as a raster image in QGIS, an open-source geographic information system (GIS). Georeferencing assigns geographic coordinates to the map, enabling it to be accurately located and displayed in a GIS.

Requirements

Software: QGIS is a free and open-source GIS. This tutorial uses version QGIS 3.34.4-Prizren.

Data: A map in raster image format (e.g., JPG, TIFF, or PNG). In this tutorial, we will georeference a map of the Alor-Pantar languages in Indonesia from Schapper (2020), Introduction to the Papuan Languages of Timor, Alor, and Pantar. The map is available as a JPG raster image here.

Map of the Alor Pantar languages
Map of the Alor Pantar languages from Schapper (2020)

Adding a basemap

Before we can georeference the Alor-Pantar map, we first open a basemap in QGIS. The basemap serves as a reference for accurately aligning the language map. To do this, go to HCMGIS > Basemaps > Vector tiles > Carto Basic to add the Carto Basic basemap from the HCMGIS plugin. Carto Basic is a clean, visually neutral basemap that uses data from OpenStreetMap.

Adding th Carto Basic basemap in QGIS
Adding the Carto Basic map as basemap.

 

The HCMGIS plugin provides quick access to a wide range of ready-to-use basemaps, including OpenStreetMap, Google Maps, and satellite imagery. If the plugin is not already installed, you’ll need to install it.

The Georeferencer plugin

Now we can use the Georeferencer plugin to align the Alor-Pantar map with the basemap. To open the plugin, go to Layer > Georeferencer.... The Georeferencer window will appear.

Open the Georeferencer plugin
Open the Georeferencer plugin.

 

The Georeferencer window is divided into two main tiles. The map tile in the top shows the raster image that will be georeferenced. In this tile you’ll add the ground control points (GCPs) to align the language map with the basemap. The bottom tile shows the GCP Table with all ground control points you’ve added, including their pixel locations, geographic coordinates, and an estimate of their accuracy. To add the language map to the top tile, click the Open Raster... icon and navigate to the location of the raster image file on your computer.

Open the language map
Open the language map in the Georeferencer.

 

Setting the ground control points

The top tile now shows the Alor Pantar map. To begin georeferencing, click the Add Point icon.

Add a Point
Add a ground control point.

 

Now locate a distinct geographic feature on the language map that is also visible on the basemap. This feature will act as a ground control point (GCP) to help align the two maps. For example, you could use the tip of a coastline, as we do here. Other suitable control points might include river bends, confluences, mountain peaks, road intersections, or any other recognisable geographic feature visible on both maps. Click on the selected feature on the language map, and the Enter Map Coordinates window will appear.

Enter Map Coordinates
The Enter Map Coordinates window.

 

The window prompts you to enter the geographic coordinates of the GCP in the X / East and Y / North fields. If you happen to know the coordinates of the feature — kudos, you are a mighty geographer. Usually, though, you won’t. Instead, click the From Map Canvas icon at the bottom of the window. You will be returned to the basemap. Locate the same feature on the basemap — in our case, the tip of the coastline — and click on it. The point you just clicked will appear as a green dot on the basemap.

The GCP on the basemap
The green dot marks the ground control point on the basemap.

 

Return to the Enter Map Coordinates window. The coordinates of the GCP will now appear in the X / East and Y / North fields, using the coordinate reference system (CRS) of the basemap, which in our case is EPSG:3857 (WGS 84 / Pseudo-Mercator).

The numeric coordinates of the first GCP
The Enter Map Coordinates window shows the numeric coordinates of the first GCP.

 

Click OK to add the GCP. You will be returned to the Georeferencer window, where the GCP appears as a red dot on the language map. Its map and image coordinates are also listed in the GCP Table.

The GCP on the language map
The ground control point on the language map.

 

Rinse and repeat to add more GCPs equally distributed around the map image. For this relatively simple map, around seven control points along the coastline of the Alor and Pantar islands should be sufficient. You can always add more points later if the georeferenced map shows distortions or doesn’t align properly.

The numeric coordinates of the first GCP
Seven GCPs on the language map.

 

Perform the georeferencing

Once you have placed a sufficient number of ground control points to accurately anchor your language map to the real-world coordinates of the basemap, you can proceed to run the georeferencing. Click on the Transformation Settings icon in the toolbar.

Open the Transformation Settings window
Open the Transformation settings window.

 

A dialog window opens where you can configure the georeferincing process to properly align your map with the basemap.

The Transformation Settings window
The Transformation Settings window.

 

In the Transformation Parameters section, define the technical settings for the transformation:

In the Alor-Pantar map, the projection of the source map is unknown, but only minor distortions are expected due to its location near the equator. For this reason, we use a second-degree polynomial transformation (Polynomial 2). In any case, treat recommendations for a specific transformation type as general guidelines rather than strict rules. No single transformation type is guaranteed to work perfectly in all situations. It’s often best to try out different methods and compare the results to determine which one provides the most accurate alignment.

In the Output Settings section, specify the output details:

Start the Georeferencing

Finally, click the Start Georeferencing icon to begin aligning the language map with the basemap.

Start georeferencing
Start the georeferencing.

 

The georeferenced map is added as a new layer. By setting the layer to transparent, you can verify that it aligns well with the basemap.

The georeferenced map
The georeferenced map.

 

We can now proceed to digitise the language polygons from the georeferenced map. This process will be covered in the next tutorial.

Output

A georeferenced map in GeoTIFF format: GeoTIFF is a standard raster format that stores both the image and its geographic reference information. The Alor-Pantar languages map, georeferenced in this tutorial, can be downloaded here.

References

Schapper, Antoinette. 2020. Introduction to The Papuan Languages of Timor, Alor and Pantar. In Antoinette Schapper (ed.), The Papuan Languages of Timor, Alor and Pantar: Volume 3, pp. 1–52. Berlin, Boston: De Gruyter Mouton. https://doi.org/10.1515/9781501511158-001