Spanish Learner Language Oral Corpus splloc
Home Team Members Contact Us

The SPLLOC Project

SPLLOC Corpora


Other Resources

Valid HTML 4.01 Transitional

Valid CSS!

Search the SPLLOC Corpus

The SPLLOC datasets are comprised of digitally recorded sound files of learner Spanish, together with transcripts in CHAT, plus transcripts in XML formats, and tagged files in some cases.

The tagged files are an additional set of transcripts which have been tagged using the automatic morphosyntactic parser (MOR). They therefore contain an additional level of coding (%MOR; see CHILDES for further details). At present these are available for the "Photos + Interview" task only.

For each of the tasks included in the corpora, there are five folders: soundfiles in wav format, soundfiles in mp3 format, transcripts in CHAT format, transcripts with morphosyntactic tags, and transcripts in XML format. In turn these are subdivided into learner groups and native speakers. To view or download the data choose a task.

You can also extract subsets from the SPLLOC corpora using the search criteria given below: