Our work is organized into three substrands with a total of 9 work packages.

  • substrand RT: “Diachronic change, phyla and reconstruction models”

This substrand comprises the following three work packages:

  • RT1 “Reconstruction, internal classification and grammatical description in the world’s two biggest phyla: Niger-Congo and Austronesian” (coordinated by I. Bril, A. François, V. Vydrin): CRLAO, LLACAN, LACITO
  • RT2 “The central sudanic languages: genetic unit or affinity group?” (coordinated by P. Boyeldieu): LLACAN
  • RT3 “Long term typological changes in languages” (coordinated by C. Pilot-Raichoor): LACITO, HTL, MII

This substrand will critically reconsider Greenberg (1963)’s classification of the languages of Africa using the traditional comparative method as well as new quantitative methods which we will apply to the new descriptive data on African languages that have become available in the last decennia. This critical reconsideration of should allow us to provide a more solid answer to the question of genetic unity of the Central Sudanic languages on the one hand and the Niger-Congo macro-phylum and its alleged lower branches on the other hand. As a result, we hope to be able to forward new and empirically better founded proposals for genetic classifications and achieve significant advances in the reconstruction of several proto languages. The same methods will be used in order to identify possible outside connections of the Austronesian family and to improve the internal classification of the languages of Taiwan. We will study significant typological variation within a number of language families for which we have the necessary expertise, viz. Austronesian, Afro-Asiatic, Tibeto-Burman, Dravidian and Iranian, and propose paths and models of typological change. We will equally engage in documentary and descriptive work on previously undescribed languages of Africa, Asia and Oceania in order to broaden the empirical foundations of the respective historical and typological hypotheses. Using sophisticated quantitative techniques and machine learning on sizable databases, for classification and reconstruction will be a major goal and a privileged domain of collaboration inside and outside the Labex teams.

  • substrand LC: “Modelling language contact and linguistic areas”

This substrand comprises the following three work packages:

  • LC1 “Multifactorial analysis of language changes” (coordinated by I. Leglise):  SEDYL, LACITO, LLACAN, LLF
  • LC2 “Areal phenomena in Northern sub-Saharan Africa” (coordinated by D. Idiatov & M. Van de Velde): LLACAN, LPP-P3
  • LC3 “Caucasus-Iran-Anatolia belt” (coordinated by A. Donabedian, P. Samvelian) SEDYL, MII, LLF

Besides vertical transmission of linguistic features through descent from a common ancestor language to its daughter languages, horizontal transmission through language contact has long been known to contribute to the process of language change. However, despite the important development of contact linguistics in the last 15 years, a lot still needs to be done on the precise modelling of contact-induced linguistic change. The primary goal of work package LC1 will be to further develop and extend to other linguistic areas the outcomes of the ongoing work on creation and processing of various corpora for multifactorial studies for the languages spoken in French Guyana. In order to be able to fully take into account the relevant linguistic features, this research necessitates creation of dedicated tools and creation of new annotated corpora (in collaboration with strand 6). The question of Sprachbund will be examined in two areas where the mere notion is subject to debate. Work packages LC2 and LC3 will critically analyse and elaborate on some of the existing areality hypotheses that have been proposed in the literature for the Macro-Sudan area and the Caucasus-Iran-Anatolian area, respectively. They will both combine a quantitative approach (inventory of features, their distribution in space, lexical databases as in the ANR program RefLex) and qualitative studies (fine-grained grammatical description, studies of categories or features involved and of their impact on the grammatical system as a whole and on language variation). The collaboration with strand 4 on Creole formation as second language acquisition will tackle the question of language emergence. These work packages will result in a lot of new data and contribute to the development of more adequate morphological and syntactic analyses and models.

  • substrand GD “Grammar and discourse: modelling interfaces”

This substrand comprises the following three work packages:

  • GD1 “Typology and annotation of information structure and grammatical relations” (coordinated by A. Mettouchi, M. Vanhove): LLACAN, SEDYL, LACITO, CRLAO
  • GD3 “A cross-linguistic approach of discourse markers” (coordinated by C. Bonnot): SEDYL, LLF
  • GD4 “A typological, historical and quantitative approach to the interrelation between tense, aspect and modality” (coordinated by P. Caudal) LLF, LACITO, LIPN, SEDYL

Typologists and field linguists, especially those working with primarily or exclusively spoken data, have been becoming increasingly aware of the relevance of various discourse related factors for linguistic analysis. A number of phenomena usually referred to jointly as information structure and sometimes considered a separate level of linguistic structure, play a crucial role in the grammar of many languages. Thus, not only they determine the word order but they also influence the choice between different types of verb or argument marking. Work package GD1 studies these grammatical markers in several corpora of various sizes which are first annotated for information structure. Such annotation is innovative both from the theoretical and methodological point of view, since it aims at allowing typological comparability. It draws on the results of the ESF project CorpAfroAs and extends it to other languages studied by Labex members. A research on the grammar-discourse interface will also permit to better account for some phenomena which tend to be neglected in most language descriptions, such as information structure. GD3 studies discourse markers from a cross-linguistic perspective in large annotated corpora (to allow access to extended contexts as described in Strand 6), already available for example for Russian (RNC) or yet to be annotated (for example, for Armenian). GD4 focuses on verbal markers pertaining to the expression of tense, aspect, mode and evidentiality. From a cross-linguistic, diachronic and synchronic point of view, these functional domains tend to interact extensively with each other, as is evident for instance in the case of the category of perfect in many languages. The markers pertaining to the expression of these functional domains are also well-known for being characterized by highly context-sensitive interpretative uses. The use of quantitative measures will allow the current project to make substantial progress in the study of TAME categories in interaction with Strands 2 and 6.