Skip to main content

Language learning has come to be a central theme in both cognitive science and artificial intelligence. The nature of language learning has long been a topic of interest for cognitive scientists, and machine learning has begun to dominate natural language processing (NLP) in modern AI. NLP systems have benefited tremendously from machine learning. However, the learning systems developed using these procedures often don’t achieve the efficiency and robustness of human language acquisition. Insights from language acquisition have the potential to help address this problem. But there are two critical challenges in exploring this possibility: (1) identifying the innate learning biases that enable fast, robust language learning in humans, and (2) determining how to translate theoretical insights about these biases into effective implementation for learning in NLP systems. This project will tackle both of these issues, making use of insights from special linguistic populations.

Our first challenge –– identifying innate language learning predispositions –– is driven by the fact that most children are exposed to linguistic input from birth, making it difficult to disentangle innate characteristics versus characteristics that are rapidly learned from input. The rare cases in which children do not have usable linguistic input can help here by allowing us make important headway in identifying these predispositions. Congenitally deaf children who cannot learn the spoken language that surrounds them, and who have not been exposed to sign language by their hearing families, are in the unique situation of being without language input early in life. These children use their hands to communicate –– they gesture –– and those gestures (called “homesigns”) take on many, but not all, of the forms and functions of languages that have been handed down from generation to generation. The properties of these naturally-arising gestures provide evidence for the nature of linguistic predispositions independent of input.

Drawing candidate biases from homesign, we will then tackle the second challenge –– incorporating biases into machine learning systems –– by systematic testing of models against real-world child language acquisition data. The goal of this phase will be to identify effective means of instantiating proposed human biases, and to test whether models incorporating these biases will successfully simulate the learning trajectories exhibited by children. Models with the proposed biases will be compared against minimally-different baseline models lacking the biases; stronger fit to human data will be taken as support that the biases are actual human predispositions. An important priority of this phase will be to balance scientific and engineering needs –– to maintain transparency of the models’ cognitive implications and to simulate human learning patterns as closely as possible, but also to use models that will interface smoothly with modern NLP systems, with promise to scale to larger datasets and broader domains.

Mentor: Allyson Ettinger, Assistant Professor, Department of Linguistics

Dr. Allyson Ettinger’s research is focused on language processing in humans and in artificial intelligence systems, motivated by a combination of scientific and engineering goals. For studying humans, her research uses computational methods to model and test hypotheses about mechanisms underlying the brain’s processing of language in real time. In the engineering domain, her research uses insights and methods from cognitive science, linguistics, and neuroscience in order to analyze, evaluate, and improve natural language understanding capacities in artificial intelligence systems. In both of these threads of research, the primary focus is on the processing and representation of linguistic meaning.