Developing a Community Resource: Russian Acquisition Corpus
The purpose of this project is to create an online, freely available database of language produced by children acquiring Russian in monolingual and bilingual contexts. Though the language of immigrant communities is often stigmatized and deprecated (even by its speakers), it is of central importance to the cultural identity and practices of these communities, and its study is crucial to understanding the fundamental properties of linguistic knowledge, language acquisition and maintenance.
So far, we have collected over 115 hours of recording; we fully transcribed, checked, annotated for disfluencies, and pseudonymised an estimated 35,000 words. These transcripts are being used for annotation experiments, to develop guidelines for parsing – adding grammatical information to – this data.
Our corpus will enable replicable results, statistical comparisons between émigré adults, heritage children, and monolingual children and adults, and investigations of frequency effects, allowing a new level of insight into grammar development in heritage and émigré speakers and thereby into the fundamental properties of language knowledge, acquisition, and attrition. It will also supply the necessary information for educators developing language materials for heritage learners, for parents raising bilingual children, and for policy makers drafting appropriate rules and procedures.