Glottography Data Tutorials

From a language map image to a Glottography dataset in six tutorials

Overview

Georeferencing
Digitising
Attributes & Metadata
Glottocodes
Data Curation
Error Correction

View the Project on GitHub Glottography/tutorials

This tutorial series walks you through the six main steps to create language speaker area polygons from a language map image, ready for upload to Glottography*.

*Glottography is an open-source initiative to collect and share the geographic areas of the world’s languages as digital open data.

What you need

The only essential input is a digital raster image of a language map (e.g., a PNG, JPG, TIFF) from a citable scientific publication. In the tutorials, we will georeference a map of the Alor-Pantar languages in Indonesia from Schapper (2020), Introduction to the Papuan Languages of Timor, Alor, and Pantar. For software, you need QGIS, an open-source geographic information system (GIS), and Python 3, a free and open-source programming language.

Output

A set of fully digitised language polygons in Cross-Linguistic Data Format (CLDF), including attributes and metadata, ready for upload to Glottography.

Tutorials

  1. Georeferencing – Assign geographic coordinates to the language map image so it can be accurately placed and displayed in a GIS.

  2. Digitising – Trace language areas on the georeferenced map and convert them into digital polygons.

  3. Adding Attributes and Metadata – Record language attributes and information from the source publications.

  4. Glottocodes – Programmatically add Glottocodes – unique identifiers for languages maintained by Glottolog – to language polygons when they are missing from the source map.

  5. Data Curation – Combine the digitised language polygons with their attributes and metadata to create a CLDF dataset ready for upload to Glottography.

  6. Error Correction – Correct geometric and attribute errors in the dataset, such as missing polygons or incorrect Glottocodes.