Multi-CAST: Multilingual Corpus of Annotated Spoken Texts


Multi-CAST (Multi-lingual Corpus of Annotated Spoken Texts) is a web-accessible archive containing annotated corpora of spoken language, from typologically diverse languages. The archive itself is hosted at the Language Archive Cologne (LAC) and can be accessed here.

Multi-CAST was developed by Geoffrey Haig and Stefan Schnell.

An overview of the corpus (background, architecture, composition) is available here(635.9 KB).