Projects

McGill Cognitive Computing Laboratory

Projects

Experience-Driven Cognitive Modeling

A central component of the research in the MCCL involves the development of distributional models of lexical semantics. This class of model learns the meaning of words through statistical regularities contained in large natural language corpora. My focus in distributional modeling is not just on lexical semantics, but also how the resulting representations from these models can be used as underlying representations to explain data across multiple cognitive systems.

Classic accounts of lexical organization assume that humans are sensitive to frequency information during learning: The more times a word is repeated, the stronger it is in memory. However, recent research suggests that it is not frequency but, rather, the diversity in a word’s contexts that is important (Adelman, et al., 2006).

Experience-Driven Model Optimization

Related to the development of experience-driven cognitive models described above is how to optimize these models to ensure maximized performance. In the field of cognitive modeling, it is standard practice to fit a set of data using a data fitting methodology, using either statistical or machine learning techniques. This is based on the premise that there are parameters that describe how a cognitive process is being used, and these techniques offer the “best shot” for the model to account for the desired behavior. However, this ignores a standard aspect of human existence: the differential experience that people have had with the world. This is especially true for language, as lexical experience is dependent on culture, geography, educational system, and so forth.

A main line of our reserach is using distributional models to better understand contextual diversity. Empirical results have demonstrated that the semantic content of contexts in which a word occurs is key to explaining word learning and lexical retrieval times (Recchia, Johns, & Jones, 2008; Johns, Dye, & Jones, 2016; Jones, Dye, & Johns, 2017). These experiments have led to the development of a distributional model that encodes word strength as a function of semantic diversity (e.g., Jones, Johns, & Recchia, 2012; Johns, Dye, & Jones, 2014, 2016, in press), and have led to better predictions of a variety of lexical effects in both visual (Jones, et al., 2012) and spoken word recognition (Johns, et al., 2012), and to patterns of lexical processing in aging and bilingualism (Johns, Sheppard, Jones, & Taler, 2016; Qui & Johns, in press). This work has led to a shift in our understanding of lexical experience: each experience of a word is not independent of past experience, which suggests word frequency is an incorrect measure of lexical strength.

Verbal fluency is a central data type used to examine semantic memory and memory search. In a semantic verbal fluency task, participants are asked to name as many items as they can from a given category, typically animals. By analyzing the pattern of output with distributional models, a greater understanding of the data can be attained. Johns, et al. (2013, 2018) demonstrated that the models could track changes in semantic memory in individuals who were developing a memory disorder. This model has also been applied to understanding the differences between bilinguals and monolinguals in memory search (Taler, Johns, Young, Sheppard, & Jones, 2013). In Taler, Johns, & Jones (in press), we applied these tools to understand the changes that are occurring in the Canadian Longitudinal Study of Aging, providing valuable insight into the changes that occur in semantic memory and language production that occurs across the lifespan for over 10,000 individuals. This work demonstrates the usefulness of distributional modeling, as it allows for the development of cognitive technologies that can be used to better understand a variety of different lexical behaviors.

In Johns, Jones, & Mewhort (2016; 2019) we developed a new type of data fitting methodology (termed experiential optimization; EO) that maps the likely knowledge that a subject or group of subjects have in order to maximize a cognitive model’s performance. The method is capable of assessing which parts of language maps onto a subject’s usage of language, and allowed for multiple language models to achieve benchmark performance across many different linguistic tasks. Johns and Jamieson (2019) further validated this approach by demonstrating that EO is sensitive to the country and time that a set of data was collected in. EO demonstrates the power of using differential experience to account for variance in behavior, and it has the potential to form a new machine learning paradigm.

Determining the Topography of the Natural Language Environment

Experiential optimization offers a systematic and objective examination of the interaction between the structure of the linguistic environment that a person lives in and their corresponding linguistic behavior. The basis of the method is the collection of large corpora spanning multiple types. However, it is worth considering what corpora are: large collections of individual behavior. In essence, each corpus represents a particular body of language written by a subset of authors, often in particular styles to meet a certain goal. To better understand individual differences in written language, Johns and Jamieson (2018) conducted a large-scale analysis of fiction books. The results demonstrated that most of the variance contained in natural language lies at the individual level – people use language in different fashions, likely based on their individual experiences with linguistic stimuli. This finding belies the question as to what causes this underlying variability in language usage.

To this point, we have demonstrated that gender (Johns & Dye, 2019) and the time and place that one grew up in (Johns & Jamieson, 2019; Johns, Dye, & Jones, in press) have substantial effects on language usage. This work further demonstrates the usefulness of distributional models, and corpus modeling in general, as they enable a quantification of the lexical environment, vital information in our understanding of language processing.

Retrieval-Based Models of Language Processing

A wide variety of research in developmental psychology has demonstrated that language is largely item-based. That is, children latch onto utterances that are uniquely informative (e.g. “give me that”) and repeatedly use the structure of these utterances in their language productions. In Johns & Jones (2015) we developed a model that explains this process, based on advances in memory modeling. This model stores exemplars of sentences, and processes new sentences by using the linguistic input as a cue to generate expectations, and was shown to account for a variety of sentence processing experiments. This work suggests that much of the complexity in language may be due to the storage and retrieval of the raw experiences that people have with language. This model has recently been extended to account for how people produce sentences (Johns, Jamieson, Crump, Jones, & Mewhort, 2016, under review) and as a general model of semantic memory (Jamieson, Avery, Johns, & Jones, 2018).

Delineating the Success of Deep Learning Models of Natural Language

Recently, a new class of distributional semantic models has been developed, entitled neural embedding models, which have been shown to provide excellent fits to lexical semantic behavior (Mikolov, et al., 2013). The mechanism underlying these models is prediction, where a neural network is made to predict the words that should surround a word in a given context (e.g. a sentence), and an error signal is used to update a word’s representation using backpropogation. Johns, Mewhort, & Jones (2019) have recently delineated the success of this model type and have demonstrated that the power of this approach does not come from its underlying neural representation or use of prediction, but instead from its large parameter space. To enable other model types to also use this information, we developed multiple parameter-free analytical transformations that can be applied to simpler representation types, which allowed for the models to meet, or exceed the performance of neural embedding models, with a fraction of the computational requirements. This work demonstrates that as the cognitive sciences continue to adopt ever more complex machine learning methods, it is going to become increasing necessary to understand the reason for their success.

The structure of lexical semantic memory and recognition memory

Computational models of memory tend to ignore the role that language plays in memory tasks, and instead have used assumed similarity distributions. To determine if this assumption is correct, we conducted a systematic analysis on the differences between semantic and random representations of words (Johns & Jones, 2010). This was done by constructing similarity distributions across a number of semantic models. The results demonstrated that random representations misrepresent lexical memory: random word pairs are much more dissimilar than the assumptions predict. The above work leaves an open question: how to integrate realistic processing mechanisms and semantic representations in a memory model. To solve this problem, we developed a model that uses the formalism of information accumulation, the cortical communication method of neural synchronization, and a co-occurrence representation of semantics. This model explains a variety of in episodic memory (Johns, Jones, & Mewhort, 2012) demonstrating the usefulness of integrating process and representation. This model has been adapted to account for recollection-based behaviors (Johns, Jones, & Mewhort, 2014).

Integrating lexical and perceptual information in memory

Many recent influential theories propose that perceptual information underlies all symbolic thought, suggesting that integrating models of semantics based on language and those based on perception is an important direction for computational models of knowledge to move into. Johns & Jones (2012) propose that an ungrounded word’s perceptual representation can be inferred based upon the associative connection to already grounded words. The model operates by storing a co-occurrence representation for words, and attaching the perceptual representation for a limited number of them. Perceptual forms are then built with a process in which the amount that a stored perceptual trace contributes to the inference is determined by the semantic similarity between the words.