Does morphosyntactic alignment shape discourse? Implementing a corpus-based approach to linguistic typology

This project is a proof-of-concept study for corpus-based approaches to typology. We address the question of whether typological differences in the morphosyntax of individual languages are reflected in the organization of spontaneous spoken discourse of those languages, with a special focus on so-called ergative languages. While claims of a co-dependence between grammar and discourse have regularly been made in the literature (Hopper 1983, Du Bois 2003, Durie 2003), the issue has never been systematically investigated on a more representative language sample.

The project builds on an existing language archive architecture (Multi-CAST, The Multilingual Corpus of Annotated Spoken Texts, online here), and implements an expanded version of the syntactic annotation system GRAID (Grammatical Relations and Animacy in Discourse, Haig & Schnell 2014, manual here). The existing language sample in Multi-CAST is being extended by the inclusion of ergative languages from the Nakh-Daghestanian language family and from Australia, and of data from Phillippine-type languages. All corpora are subjected to a standardized annotation procedure, and the resulting data feed into quantitative cross-corpus analysis in order to identify significant statistical patterns in connected discourse, for example:

  • the distribution of referential expressions across syntactic functions,
  • the density of zero-anaphora,
  • patterns of new-referent introduction,
  • division of labour among pronouns and lexical expressions,
  • the impact of animacy on syntactic configurations

The resulting dataset, the first of its kind worldwide, aids the detection of possible correlations between the alignment of morphosyntax, and probabilistic patterning in the way connected spoken language is organized.

The project is being coordinated by Geoffrey Haig, Stefan Schnell, and Nils Schiborr at the University of Bamberg, and runs in collaboration with researchers from the Centre of Excellence for Dynamics of Language, Canberra and Melbourne (Nick Thieberger), and the University of Jena (Diana Forker).

The project is supported by a DFG grant (project number 323627599), for an initial period of 2017–2020.