From a language map image to a Glottography dataset in six tutorials
Overview
Georeferencing
Digitising
Attributes & Metadata
Glottocodes
Data Curation
Error Correction
This tutorial series walks you through the six main steps to create language speaker area polygons from a language map image, ready for upload to Glottography*.
*Glottography is an open-source initiative to collect and share the geographic areas of the world’s languages as digital open data.
The only essential input is a digital raster image of a language map (e.g., a PNG, JPG, TIFF) from a citable scientific publication. In the tutorials, we will georeference a map of the Alor-Pantar languages in Indonesia from Schapper (2020), Introduction to the Papuan Languages of Timor, Alor, and Pantar. For software, you need QGIS, an open-source geographic information system (GIS), and Python 3, a free and open-source programming language.
A set of fully digitised language polygons in Cross-Linguistic Data Format (CLDF), including attributes and metadata, ready for upload to Glottography.
Georeferencing – Assign geographic coordinates to the language map image so it can be accurately placed and displayed in a GIS.
Digitising – Trace language areas on the georeferenced map and convert them into digital polygons.
Adding Attributes and Metadata – Record language attributes and information from the source publications.
Glottocodes – Programmatically add Glottocodes – unique identifiers for languages maintained by Glottolog – to language polygons when they are missing from the source map.
Data Curation – Combine the digitised language polygons with their attributes and metadata to create a CLDF dataset ready for upload to Glottography.
Error Correction – Correct geometric and attribute errors in the dataset, such as missing polygons or incorrect Glottocodes.