General description of the corpus

General description of the Corpus

The International Corpus of Learner Finnish (ICLFI) is a corpus of written learner language. The corpus consists of 4650 morphologically annotated texts (approximately 1 000 000 tokens) and about 25 percent of them (1184 texts) are already error-tagged. The texts are written by students of Finnish as a foreign language from various language backgrounds. They are compiled with the help of Finnish language teachers around the world.

The texts are divided into six levels (A1–C2) of proficiency on the basis of The Common European Framework of Reference for Languages (CEFR). The ICLFI comprises a large variety of text types (essays, narratives, argumentative texts, letters, newspaper articles, postcards, among others). In addition, the corpus provides information on a large amount of variables concerning the linguistic background of the learner, the learning task, the learning context, etc.


More information about the project can be found from Meta-Share.

The ICLFI is added as a part of Finnish Language Bank and  available for free use through Korp-platform after applying for a user license.

We’ve developed a network-based text collecting system for ICLFI. Further information updated as the process is completed.     

Viimeksi päivitetty: 8.2.2017