Over the last 50 years, the field of early language development has come to amass considerable quantities of experimental data reflecting infants' perception, and corpora data capturing infants' spoken input. In this talk, I present some results using two replicable approaches that build on these public data. One line of work re-uses CHILDES corpora to study how proposed word segmentation algorithms perform when faced with varied languages (including bilingual input), and whether these results predict infants' word comprehension across languages as found in WordBank. The other tests predictions from bottom-up (phonology first) and top-down (lexicon first) theories of early acquisition using meta-analytic data from MetaLab. Together, these studies show how far we can go in constraining language acquisition theories by riding on giants' shoulders.
Alejandrina (Alex) Cristia is the Research Director of the Laboratoire de Science Cognitives et Psycholinguistique at the Centre National de la Recherche Scientifique in Paris, France. She is interested in the linguistic representations of infants and adults, how they develop and how they shape the world’s languages, moving beyond the well-researched WEIRD countries. In her research, she combines a range of methodological approaches such as analyses of spoken corpora, behavioral studies, neuroimaging (NIRS) and computational modeling. Cristia advocates Big Data approaches involving daylong recordings of children’s language environment (as in the DARCLE network) and the use of data sharing platforms such as HomeBank and Open Science Framework (OSF).