The Giellagas Corpus of Spoken Saami Languages

The aim of The Giellagas Corpus of Spoken Saami Languages is to provide easily accessed research material from all Saami languages and areal variation. The work began in spring 2014 with the FIN-CLARIN-CONTENT -funding and it has continued with the funding of The Giellagas Institute of Oulu University and Saami Culture Archive.

The Project

The Corpus was founded for the following reasons:

1. The study of Saami languages has lacked comparative research material.

2. There is a notable amount of language material of different Saami languages and areal variation in the collection of The Saami Culture Archive. However it has not been used efficiently because most of the material has been in a “raw form”.

3. The present day language research utilizes heavily on language technology. We founded the Corpus and begun a cooperation with The FIN-CLARIN consortium in order to improve the archive collections accessibility and to motivate the users to benefit the Saami cultural heritage.

Distribution of the material

The corpus will be made available through Kielipankki, the Language Bank of Finland, which is organised by The FIN-CLARIN consortium and The Department of Modern Languages of University of Helsinki. The FIN-CLARIN consortium is a part of the international CLARIN infrastructure and it offers a service for the researchers in Finland for an easy access to all the European CLARIN-compatible language resources.

The Corpus

We have chosen to follow Pekka Sammallahti’s description of Saami languages and their areal variation (The Saami Languages: an introduction, Davvi Girji OS, 1998).

The corpus consists of audio material and a list of annotation symbols and abbreviations, metadata and ELAN files. Each ELAN file includes approximately 15 minutes of speech in Saami, a time-aligned transcription, morphological and syntactic analysis and translations in Finnish and in English. For each dialect there is a sample both from a female and a male speaker.

At present, the The Giellags Corpus of Spoken Saami Languages consists of the following material:

North Saami

1. Finnmark Saami (sN-Fsd):

  • Eastern dialect group (15 +15 minutes)
  • Western dialect group (15 +15 minutes)

2. Torne Saami (sN-Tsd):

  • The Finnish Wedge dialect (15 +15 minutes)
  • Gárasavvon dialect (15 +15 minutes)
  • Čohkkiras dialect (15 +15 minutes)

3. Sea Saami:

  • Eastern dialect (15 +15 minutes)

Skolt Saami

1. Northern group

  • Paččjok dialect (15 minutes)
  • Peäccam dialect (15 minutes)

2. Southern group

  • Suõ´nn'jel dialect (15 +15 minutes)

Inari Saami

  • Eastern dialect (15 +15 minutes)

The Corpus will be supplemented with new language material with ongoing work.