jarmo.jantunen(at)oulu.fi
Suomi toisena ja vieraana kielenä
PL 1000
90014 Oulun yliopisto

Oulun yliopisto
KORPUSTUTKIMUS OPPIJANKIELEN KIELIKOHTAISISTA JA UNIVERSAALEISTA OMINAISUUKSISTA

General description of the Corpus

The International Corpus of Learner Finnish (ICLFI) is a corpus of written learner language. The corpus consists of raw texts, which means that the corpus data are not linguistically annotated or error-tagged. The texts are written by students of Finnish as a foreign language from various language backgrounds. They are compiled with the help of Finnish language teachers around the world.

The corpus contains texts written by beginning, intermediate and advanced learners of Finnish as a foreign language. The texts are divided into these three levels of proficiency on the basis of the amount of Finnish language contact hours received at university-level (beginners: < 200 hours; intermediate learners: 200-400 hours; advanced learners: > 400 hours).

The ICLFI comprises a large variety of text types (essays, narratives, argumentative texts, letters, newspaper articles, postcards, among others). In addition, the corpus provides information on a large amount of variables concerning the linguistic background of the learner, the learning task, the learning context, etc.

More information about the project, corpus typology, and variables can be found here.

Detailed information on the size of the corpus (tokens) with statistics concerning the L1 backgrounds of the learners, proficiency levels and text types can be found here.

The ICLFI is available for free download after applying for a user licence.

Further work on the corpus includes text collecting, digitizing, annotation, error-tagging, lemmatization, POS tagging and syntactic parsing.