Tutorial: Building models to reveal how natural speech is represented in the brain

Sunday 27 April, afternoon

Presenter: Alexander Huth

Tutorial contents

In cognitive neuroscience we study how brain activity represents information about cognitive processes. One way to measure brain activity is functional magnetic resonance imaging (fMRI). This method uses magnetic fields to detect changes in blood flow and oxygenation in small areas called voxels (volumetric pixels). Because neurons in the brain use more oxygen when active, the blood-oxygen level dependent (BOLD) signal measured by fMRI gives us a rough estimate of how much total activity there is at each location in the brain.

We are interested in using fMRI to study speech comprehension. However, speech comprehension is not a single process. Rather, it is likely to involve a hierarchy of different cognitive processes: sounds are parsed into phonemes, phonemes are parsed into words, words are parsed into sentences, and sentences are parsed into a narrative. Using fMRI, we can determine which areas of the brain represent information about each level of the speech processing hierarchy. In traditional fMRI experiments this is done by comparing between responses to two types of stimuli, such as grammatical and non-grammatical sentences or speech and non-speech sounds. However, this method requires that the stimuli are constructed according to a specific hypothesis, thus potentially missing more important effects.

A more general way to study how the brain represents speech is by measuring fMRI responses to natural speech stimuli, which do not embody any specific hypothesis. After the data are collected, we can extract auditory and linguistic features from the stimuli using a variety of different computational models. Then linear regression modeling is used to determine how the BOLD signal in each voxel is influenced by each of the auditory and linguistic features. Visualizing these linear models can then show how the features are represented across the brain. Finally, we can test the validity of the linear model by comparing its predictions with actual brain activity on a separate dataset.

In this voxel-wise modeling approach each feature space represents a different hypothesis about how speech is represented in the brain. This allows the same fMRI data to be used for testing many different hypotheses. Furthermore, we can determine the relative importance of the different hypotheses by comparing how much variance each explains in a separate prediction dataset.

In this tutorial we will step through a voxel-wise modeling analysis. Attendees will use computational models to extract phonemic and semantic features from a natural speech stimulus. Then these features will be used to build linear models of fMRI data, and model weights and prediction performance will be visualized. Attendees will use the iPython notebook through their web browsers to access and run tutorial materials.


  1. (30 minutes) A lecture will describe the fMRI experiment and its results, and then outline the tutorial.
  2. (45 minutes) Attendees will use simple computational models to extract phonemic and semantic features from a natural speech stimulus.
  3. (15-minute coffee break)
  4. (60 minutes) Attendees will use their extracted features to construct a linear regression model of fMRI responses.
  5. (30 minutes) Attendees will visualize modeling results.


Alex Huth is a postdoc in Jack Gallant's lab at UC Berkeley. He uses computational techniques to model how the semantic content of narrative linguistic stimuli and visual movies is represented in brain activity. He's interested in fMRI technology, linear regression, and computational linguistics.