The Corpus

We have chosen to follow Pekka Sammallahti’s description of Saami languages and their areal variation (The Saami Languages: an introduction, Davvi Girji OS, 1998).

The corpus consists of audio material and a list of annotation symbols and abbreviations, metadata and ELAN files. Each ELAN file includes approximately 15 minutes of speech in Saami, a time-aligned transcription, morphological and syntactic analysis and translations in Finnish and in English. For each dialect there is a sample both from a female and a male speaker.

At present, the The Giellags Corpus of Spoken Saami Languages consists of the following material:

North Saami

1. Finnmark Saami (sN-Fsd):

  • Eastern dialect group (15 +15 minutes)
  • Western dialect group (15 +15 minutes)

2. Torne Saami (sN-Tsd):

  • The Finnish Wedge dialect (15 +15 minutes)
  • Gárasavvon dialect (15 +15 minutes)
  • Čohkkiras dialect (15 +15 minutes)

3. Sea Saami:

  • Eastern dialect (15 +15 minutes)

Skolt Saami

1. Northern group

  • Paččjok dialect (15 minutes)
  • Peäccam dialect (15 minutes)

2. Southern group

  • Suõ´nn'jel dialect (15 +15 minutes)

Inari Saami

  • Eastern dialect (15 +15 minutes)

The Corpus will be supplemented with new language material with ongoing work.



Marko Jouste (marko.jouste(at)

Last updated: 10.5.2016