Mauchand, M. & Pell, M.D. (2022). Listen to my feelings! How prosody and accent drive the empathic relevance of complaining speech. Neuropsychologia, 175 (10), 108356.

Interpersonal communication often involves sharing our feelings with others; complaining, for example, aims to elicit empathy in listeners by vocally expressing a speaker’s suffering. Despite the growing neuroscientific interest in the phenomenon of empathy, few have investigated how it is elicited in real time by vocal signals (prosody), and how this might be affected by interpersonal factors, such as a speaker’s cultural background (based on their accent). To investigate the neural processes at play when hearing spoken complaints, twenty-six French participants listened to complaining and neutral utterances produced by in-group French and out-group Qu´eb´ecois (i.e., French-Canadian) speakers. Participants rated how hurt the speaker felt while their cerebral activity was monitored with electroencephalography (EEG). Principal Component Analysis of Event-Related Potentials (ERPs) taken at utterance onset showed culture-dependent time courses of emotive prosody processing. The high motivational relevance of ingroup complaints increased the P200 response compared to all other utterance types; in contrast, outgroup complaints selectively elicited an early posterior negativity in the same time window, followed by an increased N400 (due to ongoing effort to derive affective meaning from outgroup voices). Ingroup neutral utterances evoked a late negativity which may reflect re-analysis of emotively less salient, but culturally relevant ingroup speech. Results highlight the time-course of neurocognitive responses that contribute to emotive speech processing for complaints, establishing the critical role of prosody as well as social-relational factors (i.e., cultural identity) on how listeners are likely to “empathize” with a speaker. 

Zhang, S. & Pell, M.D. (2022). Cultural differences in vocal expression analysis: effects of task, language, and stimulus-related factors. PLoS ONE, 17(10): e0275915.

Cultural context shapes the way that emotions are expressed and socially interpreted. Building on previous research looking at cultural differences in judgements of facial expressions, we examined how listeners recognize speech-embedded emotional expressions and make inferences about a speaker’s feelings in relation to their vocal display. Canadian and Chinese participants categorized vocal expressions of emotions (anger, fear, happiness, sadness) expressed at different intensity levels in three languages (English, Mandarin, Hindi). In two additional tasks, participants rated the intensity of each emotional expression and the intensity of the speaker’s feelings from the same stimuli. Each group was more accurate at recognizing emotions produced in their native language (in-group advantage). However, Canadian and Chinese participants both judged the speaker’s feelings to be equivalent or more intense than their actual display (especially for highly aroused, negative emotions), suggesting that similar inference rules were applied to vocal expressions by the two cultures in this task. Our results provide new insights on how people categorize and interpret speech-embedded vocal expressions versus facial expressions and what cultural factors are at play.

Caballero, J., Auclair-Ouellet, N., Phillips, N. & Pell, M.D. (2022). Social decision-making in Parkinson’s Disease. Journal of Clinical and Experimental Neuropsychology, 44(4), 302-315. Doi: 10.1080/13803395.2022.2112554.

Introduction: Parkinson’s Disease (PD) commonly affects cognition and communicative functions, including the ability to perceive socially meaningful cues from nonverbal behavior and spoken language (e.g., a speaker’s tone of voice). However, we know little about how people with PD use social information to make decisions in daily interactions (e.g., decisions to trust another person) and whether this ability rests on intact cognitive functions and executive/decision-making abilities in nonsocial domains. Method: Non-demented adults with and without PD were presented utterances that conveyed differences in speaker confidence or politeness based on the way that speakers formulated their statement and their tone of voice. Participants had to use these speechrelated cues to make trust-related decisions about interaction partners while playing the Trust Game. Explicit measures of social perception, nonsocial decision-making, and related cognitive abilities were collected. Results: Individuals with PD displayed significant differences from control participants in social decision-making; for example, they showed greater trust in game partners whose voice sounded confident and who explicitly stated that they would cooperate with the participant. The PD patients displayed relative intact social perception (speaker confidence or politeness ratings) and were unimpaired on a nonsocial decision-making task (the Dice game). No obvious relationship emerged between measures of social perception, social decision-making, or cognitive functioning in the PD sample. Conclusions: Results provide evidence of alterations in decision-making restricted to social contexts in PD individuals with relatively preserved cognition with minimal changes in social perception. Researchers and practitioners interested in how PD affects social perception and cognition should include assessments that emulate social interactions, as non-interactive tasks may fail to detect the full impact of the disease on those affected.


Mauchand, M. & Pell, M.D. (2021). Emotivity in the voice:  prosodic, lexical, and cultural appraisal of complaining speech. Frontiers in Psychology11: 619222. Doi: 10.3389/fpsyg.2020.619222.

Emotive speech is a social act in which a speaker displays emotional signals with a specific intention; in the case of third-party complaints, this intention is to elicit empathy in the listener. The present study assessed how the emotivity of complaints was perceived in various conditions. Participants listened to short statements describing painful or neutral situations, spoken with a complaining or neutral prosody, and evaluated how complaining the speaker sounded. In addition to manipulating features of the message, social-affiliative factors which could influence complaint perception were varied by adopting a cross-cultural design: participants were either Québécois (French Canadian) or French and listened to utterances expressed by both cultural groups. The presence of a complaining tone of voice had the largest effect on participant evaluations, while the nature of statements had a significant, but smaller influence. Marginal effects of culture on explicit evaluation of complaints were found. A multiple mediation analysis suggested that mean fundamental frequency was the main prosodic signal that participants relied on to detect complaints, though most of the prosody effect could not be linearly explained by acoustic parameters. These results highlight a tacit agreement between speaker and listener: what characterizes a complaint is how it is said (i.e., the tone of voice), more than what it is about or who produces it. More generally, the study emphasizes the central importance of prosody in expressive speech acts such as complaints, which are designed to strengthen social bonds and supportive responses in interactive behavior. This intentional and interpersonal aspect in the communication of emotions needs to be further considered in research on affect and communication.

Pell, M.D. & Kotz, S.A. (2021). Comment:  The next frontier: prosody research gets interpersonal. Emotion Review, 13 (1), 51-56.

Neurocognitive models (e.g., Schirmer & Kotz, 2006) have helped to characterize how listeners incrementally derive meaning from vocal expressions of emotion in spoken language, what neural mechanisms are involved at different processing stages, and their relative time course. But how can these insights be applied to communicative situations in which prosody serves a predominantly interpersonal function? This comment examines recent data highlighting the dynamic interplay of prosody and language, when vocal attributes serve the sociopragmatic goals of the speaker or reveal interpersonal information that listeners use to construct a mental representation of what is being communicated. Our comment serves as a beacon to researchers interested in how the neurocognitive system "makes sense" of socioemotive aspects of prosody.

Mauchand, M., Caballero, J., Jiang, X. & Pell, M.D. (2021). Immediate on-line use of prosody reveals a speaker’s ironic intentions: neurophysiological evidence. Cognitive Affective and Behavioral Neuroscience. Doi: 10.3758/s13415-020-00849-7

In social interactions, speakers often use their tone of voice (“prosody”) to communicate their interpersonal stance to pragmatically mark an ironic intention (e.g., sarcasm). The neurocognitive effects of prosody as listeners process ironic statements in real time are still poorly understood. In this study, 30 participants judged the friendliness of literal and ironic criticisms and compliments in the absence of context while their electrical brain activity was recorded. Event-related potentials reflecting the uptake of prosodic information were tracked at two time points in the utterance. Prosody robustly modulated P200 and late positivity amplitudes from utterance onset. These early neural responses registered both the speaker's stance (positive/negative) and their intention (literal/ironic). At a later timepoint (You are such a great/horrible cook), P200, N400, and P600 amplitudes were all greater when the critical word valence was congruent with the speaker’s vocal stance, suggesting that irony was contextually facilitated by early effects from prosody. Our results exemplify that rapid uptake of salient prosodic features allows listeners to make online predictions about the speaker’s ironic intent. This process can constrain their representation of an utterance to uncover nonliteral meanings without violating contextual expectations held about the speaker, as described by parallel-constraint satisfaction models.

Liu, P., Rigoulot, S., Jiang, X., Zhang, S. & Pell, M.D. (2020). Unattended Emotional Prosody Affects Visual Processing of Facial Expressions in Mandarin-Speaking Chinese: A Comparison With English-Speaking Canadians. Journal of Cross-Cultural Psychology. doi: 10.1177/0022022121990897.

Emotional cues from different modalities have to be integrated during communication, a process that can be shaped by an individual’s cultural background. We explored this issue in 25 Chinese participants by examining how listening to emotional prosody in Mandarin influenced participants’ gazes at emotional faces in a modified visual search task. We also conducted a cross-cultural comparison between data of this study and that of our previous work in English-speaking Canadians using analogous methodology. In both studies, eye movements were recorded as participants scanned an array of four faces portraying fear, anger, happy, and neutral expressions, while passively listening to a pseudo-utterance expressing one of the four emotions (Mandarin utterance in this study; English utterance in our previous study). The frequency and duration of fixations to each face were analyzed during 5 seconds after the onset of faces, both during the presence of the speech (early time window) and after the utterance ended (late time window). During the late window, Chinese participants looked more frequently and longer at faces conveying congruent emotions as the speech, consistent with findings from English-speaking Canadians. Cross-cultural comparison further showed that Chinese, but not Canadians, looked more frequently and longer at angry faces, which may signal potential conflicts and social threats. We hypothesize that the socio-cultural norms related to harmony maintenance in the Eastern culture promoted Chinese participants’ heightened sensitivity to, and deeper processing of, angry cues, highlighting culture-specific patterns in how individuals scan their social environment during emotion processing.


Caballero, J. & Pell, M.D. (2020). Implicit effects of speaker accents and vocally expressed confidence on decisions to trust. Decision, 7 (4), 314-331.

People often evaluate speakers with nonstandard accents as being less competent or trustworthy, which is often attributed to in-group favoritism. However, speakers can also modulate social impressions in the listener through their vocal expression (e.g., by speaking in a confident vs. a doubtful tone of voice). Here, we addressed how both accents and vocally-expressed confidence affect social outcomes in an interaction setting using the Trust Game, which operationalizes interpersonal trust using a monetary exchange situation. In a first study, 30 English Canadians interacted with partners speaking English with a Canadian, Australian, or foreign (French) accent. Speakers with each accent vocally expressed themselves in different ways (confident, doubtful, or neutral voice). Results show that trust decisions were significantly modulated by a speaker’s accent (fewer tokens were given to foreign-accented speakers) and by vocally-expressed confidence (less tokens were given to doubtful-sounding speakers). Using the same paradigm, a second study then tested whether manipulating the social identity of the speaker-listener led to similar trust decisions in participants who spoke English as a foreign language (EFL; 60 native speakers of French or Spanish). Again, EFL participants trusted partners who spoke in a doubtful manner and those with a foreign accent less, regardless of the participants’ linguistic background. Taken together, results suggest that in social-interactive settings, listeners implicitly use different sources of vocal cues to derive social impressions and to guide trust-related decisions, effects not solely driven by shared group membership. The influence of voice information on trust decisions was very similar for native and non-native listeners.

Vergis, N., Jiang, X., & Pell, M.D. (In press). Neural responses to interpersonal requests: Effects of imposition and vocally-expressed stance. Brain Research, 1740, 146855.

The way that speakers communicate their stance towards the listener is often vital for understanding the interpersonal relevance of speech acts, such as basic requests. To establish how interpersonal dimensions of an utterance affect neurocognitive processing, we compared event-related potentials elicited by requests that linguistically varied in how much they imposed on listeners (e.g., Lend me a nickel vs. hundred) and in the speaker's vocally-expressed stance towards the listener (polite or rude tone of voice). From utterance onset, effects of vocal stance were robustly differentiated by an early anterior positivity (P200) which increased for rude versus polite voices. At the utterancefinal noun that marked the 'cost' of the request (nickel vs. hundred), there was an increased negativity between 300 and 500ms in response to high-imposition requests accompanied by rude stance compared to the rest of the conditions. This N400 effect was followed by interactions of stance and imposition that continued to inform several effects in the late positivity time window (500-800ms post-onset of the critical noun), some of which correlated significantly with prosody-related changes in the P200 response from utterance onset. Results point to rapid neural differentiation of voice-related information conveying stance (around 200ms post-onset of speech) and exemplify the interplay of different sources of interpersonal meaning (stance, imposition) as listeners evaluate social implications of a request. Data show that representations of speaker meaning are actively shaped by vocal and verbal cues that encode interpersonal features of an utterance, promoting attempts to reanalyze and infer the pragmatic significance of speech acts in the 500-800ms time window.

Rigoulot, S., Vergis, N. & Pell, M.D. (2020). Neurophysiological correlates of sexually evocative speech. Biological Psychology, 154, 107909.

Speakers modulate their voice (prosody) to communicate non-literal meanings, such as sexual innuendo (She inspected his package this morning, where “package” could refer to a man’s penis). Here, we analyzed event-related potentials to illuminate how listeners use prosody to interpret sexual innuendo and what neurocognitive processes are involved. Participants listened to third-party statements with literal or ‘sexual’ interpretations, uttered in an unmarked or sexually evocative tone. Analyses revealed: 1) rapid neural differentiation of neutral vs. sexual prosody from utterance onset; (2) N400-like response differentiating contextually constrained vs. unconstrained utterances following the critical word (reflecting integration of prosody and word meaning); and (3) a selective increased negativity response to sexual innuendo around 600 ms after the critical word. Findings show that the brain quickly integrates prosodic and lexical-semantic information to form an impression of what the speaker is communicating, triggering a unique response to sexual innuendos, consistent with their high social relevance.

Jiang, X., Gossack-Keenan, K. & Pell, M.D. (2020). To believe or not to believe: How voice and accent information in speech alter listener impressions of trust. Quarterly Journal of Experimental Psychology, 73(1), 55-79.

Our decision to believe what another person says can be influenced by vocally expressed confidence in speech and by whether the speaker–listener are members of the same social group. The dynamic effects of these two information sources on neurocognitive processes that promote believability impressions from vocal cues are unclear. Here, English Canadian listeners were presented personal statements (She has access to the building ) produced in a confident or doubtful voice by speakers of their own dialect (in-group) or speakers from two different “out-groups” (regional or foreign-accented English). Participants rated how believable the speaker is  for each statement and event-related potentials (ERPs) were analysed from utterance onset. Believability decisions were modulated by both the speaker’s vocal confidence level and their perceived in-group status. For in-group speakers, ERP effects revealed an early differentiation of vocally expressed confidence (i.e., N100, P200), highlighting the motivational significance of doubtful voices for drawing believability inferences. These early effects on vocal confidence perception were qualitatively different or absent when speakers had an accent; evaluating out-group voices was associated with increased demands on contextual integration and re-analysis of a non-native representation of believability (i.e., increased N400, late negativity response). Accent intelligibility and experience with particular out-group accents each influenced how vocal confidence was processed for out-group speakers. The N100 amplitude was sensitive to out-group attitudes and predicted actual believability decisions for certain out-group speakers. We propose a neurocognitive model in which vocal identity information (social categorization) dynamically influences how vocal expressions are decoded and used to derive social inferences during person perception.

Mauchand, M., Vergis, N., & Pell, M.D. (2020). Irony, prosody, and social impressions of affective stance. Discourse Processes, 57(2), 141-157.

In spoken discourse, understanding irony requires the apprehension of subtle cues, such as the speaker’s tone of voice (prosody), which often reveal the speaker’s affective stance toward the listener in the context of the utterance. To shed light on the interplay of linguistic content and prosody on impressions of spoken criticisms and compliments (both literal and ironic), 40 participants rated the friendliness of the speaker in three separate conditions of attentional focus (No focus, Prosody focus, and Content focus). When the linguistic content was positive (“You are such an awesome driver!”), the perceived critical or friendly stance of the speaker was influenced predominantly by prosody. However, when the linguistic content was negative (“You are such a lousy driver!”), the speaker was always perceived as less friendly, even for ironic compliments that were meant to be teasing (i.e., positive stance). Our results highlight important asymmetries in how listeners use prosody and attend to different speech-related channels to form impressions of interpersonal stance for ironic criticisms (e.g., sarcasm) versus ironic compliments (e.g., teasing).

Vergis, N. & Pell, M.D. (2020). Factors in the perception of speaker politeness: the effect of linguistic structure, imposition and prosody. Journal of Politeness Research, 16(1), 45-84.

Although linguistic politeness has been studied and theorized about extensively, the role of prosody in the perception of im/polite attitudes has been somewhat neglected. In the present study, we used experimental methods to investigate the interaction of linguistic form, imposition and prosody in the perception of im/polite requests. A written task established a baseline for the level of politeness associated with certain linguistic structures. Then stimuli were recorded in polite and rude prosodic conditions and in a perceptual experiment they were judged for politeness. Results revealed that, although both linguistic structure and prosody had a significant effect on politeness ratings, the effect of prosody was much more robust. In fact, rude prosody led in some cases to the neutralization of (extra)linguistic distinctions. The important contribution of prosody to im/politeness inferences was also revealed by a comparison of the written and auditory tasks. These findings have important implications for models of im/politeness and more generally for theories of affective speech. Implications for the generation of Particularized Conversational Implicatures (PCIs) of im/politeness are also discussed.


Mori, Y. & Pell, M.D. (2019). The look of (un)confidence: Visual markers for inferring speaker confidence in speech. Frontiers in Communication, 4:63. doi:10.3389/fcomm.2019.00063.

Evidence suggests that observers can accurately perceive a speaker's static confidence level, related to their personality and social status, by only assessing their visual cues. However, less is known about the visual cues that speakers produce to signal their transient confidence level in the content of their speech. Moreover, it is unclear what visual cues observers use to accurately perceive a speaker's confidence level. Observers are hypothesized to use visual cues in their social evaluations based on the cue's level of perceptual salience and/or their beliefs about the cues that speakers with a given mental state produce. We elicited high and low levels of confidence in the speech content by having a group of speakers answer general knowledge questions ranging in difficulty while their face and upper body were video recorded. A group of observers watched muted videos of these recordings to rate the speaker's confidence and report the face/body area(s) they used to assess the speaker's confidence. Observers accurately perceived a speaker's confidence level relative to the speakers' subjective confidence, and broadly differentiated speakers as having low compared to high confidence by using speakers' eyes, facial expressions, and head movements. Our results argue that observers use a speaker's facial region to implicitly decode a speaker's transient confidence level in a situation of low-stakes social evaluation, although the use of these cues differs across speakers. The effect of situational factors on speakers' visual cue production and observers' utilization of these visual cues are discussed, with implications for improving how observers in real world contexts assess a speaker's confidence in their speech content.

Giles, R., Rothermich, K. & Pell, M.D. (2019). Differences in the evaluation of prosocial lies: A cross-cultural study of Canadian, Chinese and German adults. Frontiers in Communication, 4:38. doi: 10.3389/fcomm.2019.00038.

In daily life, humans often tell lies to make another person feel better about themselves, or to be polite, or socially appropriate in situations when telling the blunt truth would be perceived as inappropriate. Prosocial lies are a form of non-literal communication used cross-culturally, but how they are evaluated depends on socio-moral values, and communication strategies. We examined how prosocial lies are evaluated by Canadian, Chinese, and German adults. Participants watched videos and rated politeness, appropriateness, and predicted frequency of use of prosocial lies and blunt truths. A two-way intention x culture interaction was observed for appropriateness and predicted frequency of use. These results suggest that the evaluation of prosocial lies is influenced by an interplay of intercultural communication strategies depending on cultural group membership.


Jiang, X., Sanford, R. & Pell, M.D. (2018). Neural architecture underlying person perception from in-group and out-group voices. NeuroImage, 181, 582-597.

In spoken language, verbal cues (what we say) and vocal cues (how we say it) contribute to person perception, the process for interpreting information and making inferences about other people. When someone has an accent,forming impressions from the speaker's voice may be influenced by social categorization processes (i.e., activating stereotypical traits of members of a perceived ‘out-group’) and by processes which differentiate the speaker based on their individual attributes (e.g., registering the vocal confidence level of the speaker in order to make a trust decision). The neural systems for using vocal cues that refer to the speaker's identity and to qualities of their vocal expression to generate inferences about others are not known. Here, we used functional magnetic resonance imaging (fMRI) to investigate how speaker categorization influences brain activity as Canadian-English listeners judged whether they believe statements produced by in-group (native) and out-group (regional, foreign) speakers. Each statement was expressed in a confident, doubtful, and neutral tone of voice. In-group speakers were perceived as more believable than speakers with out-group accents overall, confirming social categorization of speakers based on their accent. Superior parietal and middle temporal regions were uniquely activated when listening to out-group compared to in-group speakers suggesting that they may be involved in extracting the attributes of speaker believability from the lower-level acoustic variations. Basal ganglia, left cuneus and right fusiform gyrus were activated by confident expressions produced by out-group speakers. These regions appear to participate in abstracting more ambiguous believability attributes from accented speakers (where a conflict arises between the tendency to disbelieve an out-group speaker and the tendency to believe a confident voice). For outgroup speakers, stronger impressions of believability selectively modulated activity in the bilateral superior and middle temporal regions. Moreover, the right superior temporal gyrus, a region that was associated with perceived speaker confidence, was found to be functionally connected to the left lingual gyrus and right middle temporal gyrus when out-group speakers were judged as more believable. These findings suggest that identityrelated voice characteristics and associated biases may influence underlying neural activities for making social attributions about out-group speakers, affecting decisions about believability and trust. Specifically, inferences about out-group speakers seem to be mediated to a greater extent by stimulus-related features (i.e., vocal confidence cues) than for in-group speakers. Our approach highlights how the voice can be studied to advance models of person perception.

Garrido-Vásquez, P., Pell, M.D., Paulmann, S., & Kotz, S.A. (2018). Dynamic facial expressions prime the processing of emotional prosody. Frontiers in Human Neuroscience, 12. DOI: 10.3389/fnhum.2018.00244.

Evidence suggests that emotion is represented supramodally in the human brain. Emotional facial expressions, which often precede vocally expressed emotion in real life, can modulate event-related potentials (N100 and P200) during emotional prosody processing. To investigate these cross-modal emotional interactions, two lines of research have been put forward: cross-modal integration and cross-modal priming. In cross-modal integration studies, visual and auditory channels are temporally aligned, while in priming studies they are presented consecutively. Here we used cross-modal emotional priming to study the interaction of dynamic visual and auditory emotional information. Specifically, we presented dynamic facial expressions (angry, happy, neutral) as primes and emotionally-intoned pseudo-speech sentences (angry, happy) as targets. We were interested in how prime-target congruency would affect early auditory event-related potentials, i.e., N100 and P200, in order to shed more light on how dynamic facial information is used in cross-modal emotional prediction. Results showed enhanced N100 amplitudes for incongruently primed compared to congruently and neutrally primed emotional prosody, while the latter two conditions did not significantly differ. However, N100 peak latency was significantly delayed in the neutral condition compared to the other two conditions. Source reconstruction revealed that the right parahippocampal gyrus was activated in incongruent compared to congruent trials in the N100 time window. No significant ERP effects were observed in the P200 range. Our results indicate that dynamic facial expressions influence vocal emotion processing at an early point in time, and that an emotional mismatch between a facial expression and its ensuing vocal emotional signal induces additional processing costs in the brain, potentially because the cross-modal emotional prediction mechanism is violated in case of emotional prime-target incongruency.

Caballero, J. Vergis, N., Jiang, X. & Pell, M.D. (2018). The sound of im/politeness. Speech Communication, 102, 39-53.

Until recently, research on im/politeness has primarily focused on the role of linguistic strategies while neglecting the contributions of prosody and acoustic cues for communicating politeness. Here, we analyzed a large set of recordings — verbal requests spoken in a direct manner (Lend me a nickel), preceded by the word “Please” or in a conventionally-indirect manner (Can you) — which were known to convey polite or rude impressions on the listener. The pragmatic imposition of the request was also manipulated (Lend me a nickel vs. hundred). Fundamental frequency (f0: mean, range, contour shape), duration, and voice quality (harmonics-to-noise ratio) were measured over the whole utterance and for key constituents within the utterance. Differences in perceived politeness corresponded with systematic differences in continuous utterance measures as well as local acoustic adjustments, defined by both categorical and graded vocal contrasts. Compared to polite utterances, rude requests displayed a slower speech rate, lower pitch, and tended to fall in pitch (or rise less markedly in the context of yes-no questions). The high versus low imposition of a request separately influenced the acoustic structure of requests, with evidence of these effects right at utterance-onset. Results are consistent with theoretical proposals about how prosody functions to convey speaker politeness as one facet of emotive communication. It is suggested that while a specific “prosody of politeness” may not exist, prosodic cues routinely and potently interact with other sources of information to allow listeners to generate inferences about im/politeness.

Chronaki, G., Wigelsworth, M., Pell, M.D., & Kotz, S.A. (2018). The development of cross-cultural recognition of vocal emotion during childhood and adolescence. Scientific Reports. DOI: 10.1038/s41598-018-26889-1.

Humans have an innate set of emotions recognised universally. However, emotion recognition also depends on socio-cultural rules. Although adults recognise vocal emotions universally, they identify emotions more accurately in their native language. We examined developmental trajectories of universal vocal emotion recognition in children. Eighty native English speakers completed a vocal emotion recognition task in their native language (English) and foreign languages (Spanish, Chinese, and Arabic) expressing anger, happiness, sadness, fear, and neutrality. Emotion recognition was compared across 8-to-10, 11-to-13-year-olds, and adults. Measures of behavioural and emotional problems were also taken. Results showed that although emotion recognition was above chance for all languages, native English speaking children were more accurate in recognising vocal emotions in their native language. There was a larger improvement in recognising vocal emotion from the native language during adolescence. Vocal anger recognition did not improve with age for the non-native languages. This is the first study to demonstrate universality of vocal emotion recognition in children whilst supporting an "in-group advantage" for more accurate recognition in the native language. Findings highlight the role of experience in emotion recognition, have implications for child development in modern multicultural societies and address important theoretical questions about the nature of emotions.

Truesdale, D. & Pell, M.D. (2018). The sound of passion and indifference. Speech Communication, 99, 124-134.

Extending affective speech communication research in the context of authentic, spontaneous utterances, the present study investigates two signals of affect defined by extreme levels of physiological arousal—Passion and Indifference. Exemplars were mined from podcasts conducted in informal, unstructured contexts to examine communication at extreme levels of perceived hyper- and hypo-arousal. Utterances from twenty native speakers of Canadian/American English were submitted for perceptual validation for judgments of affective meaning (Passion, Indifference, or Neutrality) and level of arousal (“Not At All” to “Very Much”). Arousal ratings, acoustic patterns, and linguistic cues (affect/emotion words and expletives) were analyzed. In comparison to neutral utterances, Passion was communicated with the highest maximum pitch and pitch range, and highest maximum and mean amplitude, while Indifference was communicated via decreases in these measures in comparison to neutral affect. Interestingly, Passion and Neutrality were expressed with comparable absolute ranges of amplitude, while the minimum amplitudes of both Passion and Indifference were greater than those of Neutral expressions. Linguistically, Indifference was marked by significantly greater use of explicit expressions of affect (e.g. I don't care…), suggesting a linguistic encoding preference in this context. Passion was expressed with greater use of expletives; yet, their presence was not necessary to facilitate perception of a speaker's level of arousal. These findings shed new light upon the paralinguistic and linguistic features of spontaneous expressions at the extremes of the arousal continuum, highlighting key distinctions between Indifference and Neutrality with implications for vocal communication research in healthy and clinical populations.

Schwartz, R., Rothermich, K., Kotz, S.A. & Pell, M.D. (2018). Unaltered emotional experience in Parkinson’s disease: Pupillometry and behavioral evidence. Journal of Clinical and Experimental Neuropsychology, 40 (3), 303-316.

Introduction: Recognizing emotions in others is a pivotal part of socioemotional functioning and plays a central role in social interactions. It has been shown that individuals suffering from Parkinson’s disease (PD) are less accurate at identifying basic emotions such as fear, sadness, and happiness; however, previous studies have predominantly assessed emotion processing using unimodal stimuli (e.g., pictures) that do not reflect the complexity of real-world processing demands. Dynamic, naturalistic stimuli (e.g., movies) have been shown to elicit stronger subjective emotional experiences than unimodal stimuli and can facilitate emotion recognition. Method: In this experiment, pupil measurements of PD patients and matched healthy controls (HC) were recorded while they watched short film clips. Participants’ task was to identify the emotion elicited by each clip and rate the intensity of their emotional response. We explored (a) how PD affects subjective emotional experience in response to dynamic, ecologically valid film stimuli, and (b) whether there are PD-related changes in pupillary response, which may contribute to the differences in emotion processing reported in the literature. Results: Behavioral results showed that identification of the felt emotion as well as perceived intensity varies by emotion, but no significant group effect was found. Pupil measurements revealed differences in dilation depending on the emotion evoked by the film clips (happy, tender, sadness, fear, and neutral) for both groups. Conclusions: Our results suggest that differences in emotional response may be negligible when PD patients and healthy controls are presented with dynamic, ecologically valid emotional stimuli. Given the limited data available on pupil response in PD, this study provides new evidence to suggest that the PD-related deficits in emotion processing reported in the literature may not translate to real-world differences in physiological or subjective emotion processing in early-stage PD patients.


Fish, K., Rothermich, K., & Pell, M.D. (2017). The sound of (in)sincerity. Journal of Pragmatics, 121, 147-161.

In social life, humans do not always communicate their sincere feelings, and speakers often tell ‘prosocial lies’ to prevent others from being hurt by negative truths. Data illuminating how a speaker's voice carries sincere or insincere attitudes in speech, and how social context shapes the expression and perception of (in)sincere utterances, are scarce. Here, we studied the communication of social, other-oriented lies occurring in short dialogues. We recorded paired questions (So, what do you think of my new hairdo?) and responses (I think it looks really amazing!) using a paradigm that elicited compliments which reflected the true positive opinion of the speaker (sincere) or were meant to hide their negative opinion (insincere/prosocial lie). These Question–Response pairs were then presented to 30 listeners, who rated the sincerity of the person uttering the compliment on a 5-point scale. Results showed that participants could successfully differentiate sincere compliments from prosocial lies based largely on vocal speech cues. Moreover, sincerity impressions were biased by how the preceding question was phrased (confident or uncertain). Acoustic analyses on a subset of utterances that promoted strong impressions of sincerity versus insincerity revealed that compliments perceived as being sincere were spoken faster and began with a higher pitch than those that sounded insincere, while compliments rated as insincere tended to get louder as the utterance unfolded. These data supply new evidence of the importance of vocal cues in evaluating sincerity, while emphasizing that motivations of both the speaker and hearer contribute to impressions of speaker sincerity.

Jiang, X., Sanford, R. & Pell, M.D. (2017). Neural systems for evaluating speaker (un)believability. Human Brain Mapping, 38, 3732-3729.

Our voice provides salient cues about how confident we sound, which promotes inferences about how believable we are. However, the neural mechanisms involved in these social inferences are largely unknown. Employing functional magnetic resonance imaging, we examined the brain networks and individual differences underlying the evaluation of speaker believability from vocal expressions. Participants (n = 26) listened to statements produced in a confident, unconfident, or “prosodically unmarked” (neutral) voice, and judged how believable the speaker was on a 4-point scale. We found frontal–temporal networks were activated for different levels of confidence, with the left superior and inferior frontal gyrus more activated for confident statements, the right superior temporal gyrus for unconfident expressions, and bilateral cerebellum for statements in a neutral voice. Based on listener's believability judgment, we observed increased activation in the right superior parietal lobule (SPL) associated with higher believability, while increased left posterior central gyrus (PoCG) was associated with less believability. A psychophysiological interaction analysis found that the anterior cingulate cortex and bilateral caudate were connected to the right SPL when higher believability judgments were made, while supplementary motor area was connected with the left PoCG when lower believability judgments were made. Personal characteristics, such as interpersonal reactivity and the individual tendency to trust others, modulated the brain activations and the functional connectivity when making believability judgments. In sum, our data pinpoint neural mechanisms that are involved when inferring one's believability from a speaker's voice and establish ways that these mechanisms are modulated by individual characteristics of a listener.

Liu, P., Rigoulot, S., & Pell, M.D. (2017). Cultural immersion alters emotion perception: Neurophysiological evidence from Chinese immigrants to Canada. Social Neuroscience, 12 (6), 685-700.

To explore how cultural immersion modulates emotion processing, this study examined how Chinese immigrants to Canada process multisensory emotional expressions, which were compared to existing data from two groups, Chinese and North Americans. Stroop and Oddball paradigms were employed to examine different stages of emotion processing. The Stroop task presented face-voice pairs expressing congruent/incongruent emotions and participants actively judged the emotion of one modality while ignoring the other. A significant effect of cultural immersion was observed in the immigrants' behavioral performance, which showed greater interference from to-be-ignored faces, comparable with what was observed in North Americans. However, this effect was absent in their N400 data, which retained the same pattern as the Chinese. In the Oddball task, where immigrants passively viewed facial expressions with/without simultaneous vocal emotions, they exhibited a larger visual MMN for faces accompanied by voices, again mirroring patterns observed in Chinese. Correlation analyses indicated that the immigrants' living duration in Canada was associated with neural patterns (N400 and vMMN) more closely resembling North Americans. Our data suggest that in multisensory emotion processing, adopting to a new culture first leads to behavioral accommodation followed by alterations in brain activities, providing new evidence on human's neurocognitive plasticity in communication.

Jiang, X. & Pell, M.D. (2017). The sound of confidence and doubt. Speech Communication, 88, 106-126.

Feeling of knowing (or expressed confidence) reflects a speaker's certainty or commitment to a statement and can be associated with one's trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I'm sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Expt. 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker's feeling of (un)knowing.

Schwartz, R. & Pell, M.D. (2017). When emotion and expression diverge: the social costs of Parkinson’s disease. Journal of Clinical and Experimental Neuropsychology, 39 (3), 211-230.

Introduction: Patients with Parkinson's disease (PD) are perceived more negatively than their healthy peers, yet it remains unclear what factors contribute to this negative social perception. Method: Based on a cohort of 17 PD patients and 20 healthy controls, we assessed how naïve raters judge the emotion and emotional intensity displayed in dynamic facial expressions as adults with and without PD watched emotionally evocative films (Experiment 1), and how age-matched peers naïve to patients' disease status judge their social desirability along various dimensions from audiovisual stimuli (interview excerpts) recorded after certain films (Experiment 2). Results: In Experiment 1, participants with PD were rated as significantly more facially expressive than healthy controls; moreover, ratings demonstrated that PD patients were routinely mistaken for experiencing a negative emotion, whereas controls were rated as displaying a more positive emotion than they reported feeling. In Experiment 2, results showed that age-peers rated PD patients as significantly less socially desirable than control participants. Specifically, PD patients were rated as less involved, interested, friendly, intelligent, optimistic, attentive, and physically attractive than healthy controls. Conclusions: Taken together, our results point to a disconnect between how PD patients report feeling and attributions that others make about their emotions and social characteristics, underlining significant social challenges of the disease. In particular, changes in the ability to modulate the expression of negative emotions may contribute to the negative social impressions that many PD patients face.


Jiang, X., & Pell, M.D. (2016).The Feeling of Another's Knowing: How "Mixed Messages" in Speech Are Reconciled.Journal of Experimental Psychology:Human Perception and Performance.

Listeners often encounter conflicting verbal and vocal cues about the speaker's feeling of knowing; these "mixed messages" can reflect online shifts in one's mental state as they utter a statement, or serve different social-pragmatic goals of the speaker. Using a cross-splicing paradigm, we investigated how conflicting cues about a speaker's feeling of (un)knowing change one's perception. Listeners rated the confidence of speakers of utterances containing an initial verbal phrase congruent or incongruent with vocal cues in a subsequent statement, while their brain potentials were tracked. Different forms of conflicts modulated the perceived confidence of the speaker, the extent to which was stronger for female listeners. A confident phrase followed by an unconfident voice enlarged an anteriorly maximized negativity for female listeners and late positivity for male listeners, suggesting that mental representations of another's feeling of knowing in face of this conflict were hampered by increased demands of integration for females and increased demands on updating for males. An unconfident phrase followed by a confident voice elicited a delayed sustained positivity (from 900 ms) in female participants only, suggesting females generated inferences to moderate the conflicting message about speaker knowledge. We highlight ways that verbal and vocal cues are real-time integrated to access a speaker's feeling of (un)knowing, while arguing that females are more sensitive to the social relevance of conflicting speaker cues.

Garrido-Vásquez, P., Pell, M.D, Paulmann. S., Sehm, B.,& Kotz, S. A. (2016). Impaired Neural Processing of Dynamic Faces in Left-Onset Parkinson's Disease. Neuropsychologia, 82, 123-133

Parkinson's disease (PD) affects patients beyond the motor domain. According to previous evidence, one mechanism that may be impaired in the disease is face processing. However, few studies have investigated this process at the neural level in PD. Moreover, research using dynamic facial displays rather than static pictures is scarce, but highly warranted due to the higher ecological validity of dynamic stimuli. In the present study we aimed to investigate how PD patients process emotional and non-emotional dynamic face stimuli at the neural level using event-related potentials. Since the literature has revealed a predominantly right-lateralized network for dynamic face processing, we divided the group into patients with left (LPD) and right (RPD) motor symptom onset (right versus left cerebral hemisphere predominantly affected, respectively). Participants watched short video clips of happy, angry, and neutral expressions and engaged in a shallow gender decision task in order to avoid confounds of task difficulty in the data. In line with our expectations, the LPD group showed significant face processing deficits compared to controls. While there were no group differences in early, sensory-driven processing (fronto-central N1 and posterior P1), the vertex positive potential, which is considered the fronto-central counterpart of the face-specific posterior N170 component, had a reduced amplitude and delayed latency in the LPD group. This may indicate disturbances of structural face processing in LPD. Furthermore, the effect was independent of the emotional content of the videos. In contrast, static facial identity recognition performance in LPD was not significantly different from controls, and comprehensive testing of cognitive functions did not reveal any deficits in this group. We therefore conclude that PD, and more specifically the predominant right-hemispheric affection in left-onset PD, is associated with impaired processing of dynamic facial expressions, which could be one of the mechanisms behind the often reported problems of PD patients in their social lives.

PDF icon neuropsychologia_jan2016_xm_pell.pdfPDF icon impaired-neural-processing-pd-neuropsychologia-jan2016.pdf

Jiang, X., & Pell, M.D. (2016). Neural responses towards a speaker's feeling of (un)knowing. Neuropsychologia, 81, 79-93.

During interpersonal communication, listeners must rapidly evaluate verbal and vocal cues to arrive at an integrated meaning about the utterance and about the speaker, including a representation of the speaker's ‘feeling of knowing’ (i.e., how confident they are in relation to the utterance). In this study, we investigated the time course and neural responses underlying a listener's ability to evaluate speaker confidence from combined verbal and vocal cues. We recorded real-time brain responses as listeners judged statements conveying three levels of confidence with the speaker's voice (confident, close-to-confident, unconfident), which were preceded by meaning-congruent lexical phrases (e.g. I am positive, Most likely, Perhaps). Event-related potentials to utterances with combined lexical and vocal cues about speaker confidence were compared to responses elicited by utterances without the verbal phrase in a previous study ( Jiang and Pell, 2015). Utterances with combined cues about speaker confidence elicited reduced, N1, P2 and N400 responses when compared to corresponding utterances without the phrase. When compared to confident statements, close-to-confident and unconfident expressions elicited reduced N1 and P2 responses and a late positivity from 900 to 1250 ms; unconfident and close-to-confident expressions were differentiated later in the 1250–1600 ms time window. The effect of lexical phrases on confidence processing differed for male and female participants, with evidence that female listeners incorporated information from the verbal and vocal channels in a distinct manner. Individual differences in trait empathy and trait anxiety also moderated neural responses during confidence processing. Our findings showcase the cognitive processing mechanisms and individual factors governing how we infer a speaker's mental (knowledge) state from the speech signal.

PDF icon neuropsychologia_jan2016_xm_pell.pdf


Pell, M.D., Rothermich, K., Liu, P., Paulmann, S., Sethi, S., & Rigoulot, S. (2015). Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biological Psychology, 11, 14-25

This study used event-related brain potentials (ERPs) to compare the time course of emotion process-ing from non-linguistic vocalizations versus speech prosody, to test whether vocalizations are treatedpreferentially by the neurocognitive system. Participants passively listened to vocalizations or pseudo-utterances conveying anger, sadness, or happiness as the EEG was recorded. Simultaneous effects of vocalexpression type and emotion were analyzed for three ERP components (N100, P200, late positive compo-nent). Emotional vocalizations and speech were differentiated very early (N100) and vocalizations elicitedstronger, earlier, and more differentiated P200 responses than speech. At later stages (450–700 ms), angervocalizations evoked a stronger late positivity (LPC) than other vocal expressions, which was similarbut delayed for angry speech. Individuals with high trait anxiety exhibited early, heightened sensitiv-ity to vocal emotions (particularly vocalizations). These data provide new neurophysiological evidencethat vocalizations, as evolutionarily primitive signals, are accorded precedence over speech-embeddedemotions in the human voice.

PDF icon pell_biolpsychology_2015.pdf

Rothermich, K., & Pell, M.D. (2015). Introducing RISC: A New Video Inventory for Testing Social Perception. PLoS ONE, 10 (7), 1-24.

Indirect forms of speech, such as sarcasm, jocularity (joking), and ‘white lies’ told to spare another’s feelings, occur frequently in daily life and are a problem for many clinical populations. During social interactions, information about the literal or nonliteral meaning of a speaker unfolds simultaneously in several communication channels (e.g., linguistic, facial, vocal, and body cues); however, to date many studies have employed uni-modal stimuli, for example focusing only on the visual modality, limiting the generalizability of these results to everyday communication. Much of this research also neglects key factors for interpreting speaker intentions, such as verbal context and the relationship of social partners. Relational Inference in Social Communication (RISC) is a newly developed (English-language) database composed of short video vignettes depicting sincere, jocular, sarcastic, and white lie social exchanges between two people. Stimuli carefully manipulated the social relationship between communication partners (e.g., boss/employee, couple) and the availability of contextual cues (e.g. preceding conversations, physical objects) while controlling for major differences in the linguistic content of matched items. Here, we present initial perceptual validation data (N = 31) on a corpus of 920 items. Overall accuracy for identifying speaker intentions was above 80 % correct and our results show that both relationship type and verbal context influence the categorization of literal and nonliteral interactions, underscoring the importance of these factors in research on speaker intentions. We believe that RISC will prove highly constructive as a tool in future research on social cognition, inter-personal communication, and the interpretation of speaker intentions in both healthy adults and clinical populations.

PDF icon risc_plosone_rothermich.pdf

Liu, P., Rigoulot, S., & Pell, M.D. (2015). Cultural Differences in on-line sensitivity to emotional voices: comparing East and West. Frontiers in Human Neuroscience, 9, 1-12.

Evidence that culture modulates on-line neural responses to the emotional meanings encoded by vocal and facial expressions was demonstrated recently in a study comparing English North Americans and Chinese (Liu et al., 2015). Here, we compared how individuals from these two cultures passively respond to emotional cues from faces and voices using an Oddball task. Participants viewed in-group emotional faces, with or without simultaneous vocal expressions, while performing a face-irrelevant visual task as the EEG was recorded. A significantly larger visual Mismatch Negativity (vMMN) was observed for Chinese vs. English participants when faces were accompanied by voices, suggesting that Chinese were influenced to a larger extent by task-irrelevant vocal cues. These data highlight further differences in how adults from East Asian vs. Western cultures process socio-emotional cues, arguing that distinct cultural practices in communication (e.g., display rules) shape neurocognitive activity associated with the early perception and integration of multisensory emotional cues.

PDF icon pliu_frontiers_2015.pdf

Jiang, X., Paulmann, S., Robin, J., & Pell, M.D. (2015). More than accuracy: Nonverbal dialects modulate the time course of vocal emotion recognition across cultures. Journal of Experimental Psychology: Human Perception and Performance, 41 (3), 597-612.

Using a gating paradigm, this study investigated the nature of the in-group advantage in vocal emotion recognition by comparing 2 distinct cultures. Pseudoutterances conveying 4 basic emotions, expressed in English and Hindi, were presented to English and Hindi listeners. In addition to hearing full utterances, each stimulus was gated from its onset to construct 5 processing intervals to pinpoint when the in-group advantage emerges, and whether this differs when listening to a foreign language (English participants judging Hindi) or a second language (Hindi participants judging English). An index of the mean emotion identification point for each group and unbiased measures of accuracy at each time point was calculated. Results showed that in each language condition, native listeners were faster and more accurate than non-native listeners to recognize emotions. The in-group advantage emerged in both conditions after processing 400 ms to 500 ms of acoustic information. In the bilingual Hindi group, greater oral proficiency in English predicted faster and more accurate recognition of English emotional expressions. Consistent with dialect theory, our findings provide new evidence that nonverbal dialects impede both the accuracy and the efficiency of vocal emotion processing in cross-cultural settings, even when individuals are highly proficient in the out-group target language.

Liu, P., Rigoulot. S., & Pell, M. D. (2015). Culture modulates the brain response to human expressions of emotion: Electrophysiological evidence. Neuropsychologia, 67, 1-13.

To understand how culture modulates on-line neural responses to social information, this study compared how individuals from two distinct cultural groups, English-speaking North Americans and Chinese, process emotional meanings of multi-sensory stimuli as indexed by both behaviour (accuracy) and event-related potential (N400) measures. In an emotional Stroop-like task, participants were presented face-voice pairs expressing congruent or incongruent emotions in conditions where they judged the emotion of one modality while ignoring the other (face or voice focus task). Results indicated that while both groups were sensitive to emotional differences between channels (with lower accuracy and higher N400 amplitudes for incongruent face-voice pairs), there were marked group differences in how intruding facial or vocal cues affected accuracy and N400 amplitudes, with English participants showing greater interference from irrelevant faces than Chinese. Our data illuminate distinct biases in how adults from East Asian versus Western cultures process socio-emotional cues, supplying new evidence that cultural learning modulates not only behaviour, but the neurocognitive response to different features of multi-channel emotion expressions.

PDF icon liu_rigolout_pell_2015.pdf

Rigoulot. S., Pell, M.D., & Armony, J. L. (2015). Time course of the influence of musical expertise on the processing of vocal and musical sounds. Neuroscience, 290, 175-184.

Previous fMRI studies have suggested that different cerebral regions preferentially process human voice and music. Yet, little is known on the temporal course of the brain processes that decode the category of sounds and how the expertise in one sound category can impact these processes. To address this question, we recorded the electroencephalogram (EEG) of 16 musicians and 18 non-musicians while they were listening to short musical excerpts (piano and violin) and vocal stimuli (speech and non-linguistic vocalizations). The task of the participants was to detect noise targets embedded within the stream of sounds. Event-related potentials revealed an early differentiation of sound category, within the first 100 milliseconds after the onset of the sound, with mostly increased responses to musical sounds. Importantly, this effect was modulated by the musical background of participants, as musicians were more responsive to music sounds than non-musicians, consistent with the notion that musical training increases sensitivity to music. In late temporal windows, brain responses were enhanced in response to vocal stimuli, but musicians were still more responsive to music. These results shed new light on the temporal course of neural dynamics of auditory processing band reveal how it is impacted by the stimulus category and the expertise of participants.

PDF icon rigoulot_pell_armony_2015.pdf

Jiang, X., & Pell, M.D. (2015). On how the brain decodes vocal cues about speaker confidence. Cortex, 66, 9-34.

In speech communication, listeners must accurately decode vocal cues that refer to the speaker’s mental state, such as their confidence or ‘feeling of knowing’. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener’s ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners’ real-time brain responses while they evaluated statements wherein the speaker’s tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident)or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330 – 500 ms and 550 – 740 ms time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980 – 1270 ms window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker’s confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 ms after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker’s meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer and Kotz, 2006) by revealing how a speaker’s mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions.

PDF icon jiang_pell_2015.pdf


Pell, M. D., Monetta, L., Rothermich, K., Kotz, S. A., Cheang, H. S., & McDonald, S. (2014). Social perception in adults With Parkinson’s disease. Neuropsychology, 28(6), 905-916.

Objective: Our study assessed how nondemented patients with Parkinson's disease (PD) interpret the affective and mental states of others from spoken language (adopt a "theory of mind") in ecologically valid social contexts. A secondary goal was to examine the relationship between emotion processing, mentalizing, and executive functions in PD during interpersonal communication.
Method: Fifteen adults with PD and 16 healthy adults completed The Awareness of Social Inference Test, a standardized tool comprised of videotaped vignettes of everyday social interactions (McDonald, Flanagan, Rollins, & Kinch, 2003). Individual subtests assessed participants' ability to recognize basic emotions and to infer speaker intentions (sincerity, lies, sarcasm) from verbal and nonverbal cues, and to judge speaker knowledge, beliefs, and feelings. A comprehensive neuropsychological evaluation was also conducted.
Results: Patients with mild-moderate PD were impaired in the ability to infer "enriched" social intentions, such as sarcasm or lies, from nonliteral remarks; in contrast, adults with and without PD showed a similar capacity to recognize emotions and social intentions meant to be literal. In the PD group, difficulties using theory of mind to draw complex social inferences were significantly correlated with limitations in working memory and executive functioning.
Conclusions: In early PD, functional compromise of the frontal-striatal-dorsal system yields impairments in social perception and understanding nonliteral speaker intentions that draw upon cognitive theory of mind. Deficits in social perception in PD are exacerbated by a decline in executive resources, which could hamper the strategic deployment of attention to multiple information sources necessary to infer social intentions.

PDF icon pelletal_neuropsychology_2014.pdf

Rigoulot. S., Fish. K., & Pell, M. D. (2014). Neural correlates of inferring speaker sincerity from white lies: an event-related potential source localization study. Brain Research, 1565, 48-62.

During social interactions, listeners weigh the importance of linguistic and extra-linguistic speech cues (prosody) to infer the true intentions of the speaker in reference to what is actually said. In this study, we investigated what brain processes allow listeners to detect when a spoken compliment is meant to be sincere (true compliment) or not (“white lie”). Electroencephalograms of 29 participants were recorded while they listened to Question–Response pairs, where the response was expressed in either a sincere or insincere tone (e.g., “So, what did you think of my presentation?”/“I found it really interesting.”). Participants judged whether the response was sincere or not. Behavioral results showed that prosody could be effectively used to discern the intended sincerity of compliments. Analysis of temporal and spatial characteristics of event-related potentials (P200, N400, P600) uncovered significant effects of prosody on P600 amplitudes, which were greater in response to sincere versus insincere compliments. Using low resolution brain electromagnetic tomography (LORETA), we determined that the anatomical sources of this activity were likely located in the (left) insula, consistent with previous reports of insular activity in the perception of lies and concealments. These data extend knowledge of the neurocognitive mechanisms that permit context-appropriate inferences about speaker feelings and intentions during interpersonal communication.

PDF icon Rigulout Fish Pell 2014

Rigoulot, R., & Pell, M.D. (2014). Emotion in the voice influences the way we scan emotional faces.  Speech Communication, 65, 36-49.

Previous eye-tracking studies have found that listening to emotionally-inflected utterances guides visual behavior towards an emotionally congruent face (e.g., Rigoulot and Pell, 2012). Here, we investigated in more detail whether emotional speech prosody influences how participants scan and fixate specific features of an emotional face that is congruent or incongruent with the prosody. Twenty-one participants viewed individual faces expressing fear, sadness, disgust, or happiness while listening to an emotionally-inflected pseudo-utterance spoken in a congruent or incongruent prosody. Participants judged whether the emotional meaning of the face and voice were the same or different (match/mismatch). Results confirm that there were significant effects of prosody congruency on eye movements when participants scanned a face, although these varied by emotion type; a matching prosody promoted more frequent looks to the upper part of fear and sad facial expressions, whereas visual attention to upper and lower regions of happy (and to some extent disgust) faces was more evenly distributed. These data suggest ways that vocal emotion cues guide how humans process facial expressions in a way that could facilitate recognition of salient visual cues, to arrive at a holistic impression of intended meanings during interpersonal events.

PDF icon rigoulot_pell_2014.pdf


Garrido-Vásquez, P., Pell, M. D., Paulmann, S., Strecker, K., Schwarz, J., & Kotz, S. A. (2013). An ERP study of vocal emotion processing in asymmetric Parkinson’s disease. Social Cognitive and Affective Neuroscience, 8, 918-927.

Parkinson’s disease (PD) has been related to impaired processing of emotional speech intonation (emotional prosody). One distinctive feature of idiopathic PD is motor symptom asymmetry, with striatal dysfunction being strongest in the hemisphere contralateral to the most affected body side. It is still unclear whether this asymmetry may affect vocal emotion perception. Here, we tested 22 PD patients (10 with predominantly left-sided [LPD] and 12 with predominantly right-sided [RPD] motor symptoms) and 22 healthy controls in an event-related potential study. Sentences conveying different emotional intonations were presented in lexical and pseudo-speech versions. Task varied between an explicit and an implicit instruction. Of specific interest was emotional salience detection from prosody, reflected in the P200 component. We predicted that patients with predominantly right-striatal dysfunction (LPD) would exhibit P200 alterations. Our results support this assumption. LPD patients showed enhanced P200 amplitudes, and specific deficits were observed for disgust prosody, explicit anger processing, and implicit processing of happy prosody. Lexical speech was predominantly affected while the processing of pseudo-speech was largely intact. P200 amplitude in patients correlated significantly with left motor scores and asymmetry indices. The data suggest that emotional salience detection from prosody is affected by asymmetric neuronal degeneration in PD.

PDF icon garrido-vasquez_etal_2012.pdf

Rigoulot, S., Wassiliwizky, E., & Pell, M. D. (2013). Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition. Frontiers in Psychology, 4, 1-14.

Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereashappiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fearangersadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgustwere recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech.



Liu, P., & Pell, M.D. (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal stimuli. Behavior Research Methods, 44, 1042-1051.

To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes.


Schwartz, R., & Pell, M.D. (2012). Emotional speech processing at the intersection of prosody and semantics. PLoS ONE, 7 (10): e47279.

The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.


Rigoulot, S., & Pell., M.D. (2012). Seeing emotion with your ears: Emotional prosody implicitly guides visual attention to faces. PLoS ONE, 7 (1), e30740.

Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0–1250 ms], [1250–2500 ms], [2500–5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions.


Paulmann, S., Titone, D., & Pell, M.D. (2012). How emotional prosody guides your way: evidence from eye movements. Speech Communication, 54, 92-107.

This study investigated cross-modal effects of emotional voice tone (prosody) on face processing during instructed visual search. Specifically, we evaluated whether emotional prosodic cues in speech have a rapid, mandatory influence on eye movements to an emotionally-related face, and whether these effects persist as semantic information unfolds. Participants viewed an array of six emotional faces while listening to instructions spoken in an emotionally congruent or incongruent prosody (e.g., “Click on the happy face” spoken in a happy or angry voice). The duration and frequency of eye fixations were analyzed when only prosodic cues were emotionally meaningful (pre-emotional label window:“Click on the/. . .”), and after emotional semantic information was available (post-emotional label window:“. . ./happy face”). In the pre-emotional label window, results showed that participants made immediate use of emotional prosody, as reflected in significantly longer frequent fixations to emotionally congruent versus incongruent faces. However, when explicit semantic information in the instructions became available (post-emotional label window), the influence of prosody on measures of eye gaze was relatively minimal. Our data show that emotional prosody has a rapid impact on gaze behavior during social information processing, but that prosodic meanings can be overridden by semantic cues when linguistic information is task relevant.


Jaywant, A. & Pell, M.D. (2012). Categorical processing of negative emotions from speech prosody. Speech Communication, 54, 1-10.

Everyday communication involves processing nonverbal emotional cues from auditory and visual stimuli. To characterize whether emotional meanings are processed with category-specificity from speech prosody and facial expressions, we employed a cross-modal priming task (the Facial Affect Decision Task; Pell, 2005a) using emotional stimuli with the same valence but that differed by emotion category. After listening to angry, sad, disgusted, or neutral vocal primes, subjects rendered a facial affect decision about an emotionally congruent or incongruent face target. Our results revealed that participants made fewer errors when judging face targets that conveyed the same emotion as the vocal prime, and responded significantly faster for most emotions (anger and sadness). Surprisingly, participants responded slower when the prime and target both conveyed disgust, perhaps due to attention biases for disgust-related stimuli. Our findings suggest that vocal emotional expressions with similar valence are processed with category specificity, and that discrete emotion knowledge implicitly affects the processing of emotional faces between sensory modalities.



Cheang, H. S., & Pell, M.D. (2011). Recognizing sarcasm without language: A cross linguistic study of English and Cantonese. Pragmatics & Cognition, 19, 203-223.

Everyday communication involves processing nonverbal emotional cues from auditory and visual stimuli. To characterize whether emotional meanings are processed with category-specificity from speech prosody and facial expressions, we employed a cross-modal priming task (the Facial Affect Decision Task; Pell, 2005a) using emotional stimuli with the same valence but that differed by emotion category. After listening to angry, sad, disgusted, or neutral vocal primes, subjects rendered a facial affect decision about an emotionally congruent or incongruent face target. Our results revealed that participants made fewer errors when judging face targets that conveyed the same emotion as the vocal prime, and responded significantly faster for most emotions (anger and sadness). Surprisingly, participants responded slower when the prime and target both conveyed disgust, perhaps due to attention biases for disgust-related stimuli. Our findings suggest that vocal emotional expressions with similar valence are processed with category specificity, and that discrete emotion knowledge implicitly affects the processing of emotional faces between sensory modalities.


Jesso, S., Morlog, D., Ross, S., Pell, M.D., Pasternak, S., Mitchell, D., Kertesz, A., & Finger, E. (2011). The effects of oxytocin on social cognition and behaviour in frontotemporal dementia. Brain, 134, 2493-2501.

Patients with behavioural variant frontotemporal dementia demonstrate abnormalities in behaviour and social cognition, including deficits in emotion recognition. Recent studies suggest that the neuropeptide oxytocin is an important mediator of social behaviour, enhancing prosocial behaviours and some aspects of emotion recognition across species. The objective of this study was to assess the effects of a single dose of intranasal oxytocin on neuropsychiatric behaviours and emotion processing in patients with behavioural variant frontotemporal dementia. In a double-blind, placebo-controlled, randomized cross-over design, 20 patients with behavioural variant frontotemporal dementia received one dose of 24 IU of intranasal oxytocin or placebo and then completed emotion recognition tasks known to be affected by frontotemporal dementia and by oxytocin. Caregivers completed validated behavioural ratings at 8 h and 1 week following drug administrations. A significant improvement in scores on the Neuropsychiatric Inventory was observed on the evening of oxytocin administration compared with placebo and compared with baseline ratings. Oxytocin was also associated with reduced recognition of angry facial expressions by patients with behavioural variant frontotemporal dementia. Together these findings suggest that oxytocin is a potentially promising, novel symptomatic treatment candidate for patients with behavioural variant frontotemporal dementia and that further study of this neuropeptide in frontotemporal dementia is warranted.

PDF icon jesso-et-al-2011.pdf

Pell M. D., & Kotz S. A. (2011). On the time course of vocal emotion recognition. PLoS ONE, 6 (11): e27256.

How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.


Pell, M.D., Jaywant, A., Monetta, L., & Kotz, S.A. (2011). Emotional speech processing: disentangling the effects of prosody and semantic cues. Cognition & Emotion, 25 (5), 834-853.

To inform how emotions in speech are implicitly processed and registered in memory, we compared how emotional prosody, emotional semantics, and both cues in tandem prime decisions about conjoined emotional faces. Fifty-two participants rendered facial affect decisions (Pell, 2005a), indicating whether a target face represented an emotion (happiness or sadness) or not (a facial grimace), after passively listening to happy, sad, or neutral prime utterances. Emotional information from primes was conveyed by: (1) prosody only; (2) semantic cues only; or (3) combined prosody and semantic cues. Results indicated that prosody, semantics, and combined prosody–semantic cues facilitate emotional decisions about target faces in an emotion-congruent manner. However, the magnitude of priming did not vary across tasks. Our findings highlight that emotional meanings of prosody and semantic cues are systematically registered during speech processing, but with similar effects on associative knowledge about emotions, which is presumably shared by prosody, semantics, and faces.


Paulmann, S. & Pell, M.D. (2011). Is there an advantage for recognizing multi-modal emotional stimuli? Motivation and Emotion, 35, 192-201.

Emotions can be recognized whether conveyed by facial expressions, linguistic cues (semantics), or prosody (voice tone). However, few studies have empirically documented the extent to which multi-modal emotion perception differs from uni-modal emotion perception. Here, we tested whether emotion recognition is more accurate for multi-modal stimuli by presenting stimuli with different combinations of facial, semantic, and prosodic cues. Participants judged the emotion conveyed by short utterances in six channel conditions. Results indicated that emotion recognition is significantly better in response to multi-modal versus uni-modal stimuli. When stimuli contained only one emotional channel, recognition tended to be higher in the visual modality (i.e., facial expressions, semantic information conveyed by text) than in the auditory modality (prosody), although this pattern was not uniform across emotion categories. The advantage for multi-modal recognition may reflect the automatic integration of congruent emotional information across channels which enhances the accessibility of emotion-related knowledge in memory.



Paulmann, S. & Pell, M.D. (2010). Dynamic emotion processing in Parkinson’s disease as a function of channel availability. Journal of Clinical and Experimental Neuropsychology, 32 (8), 822-835.

Parkinson's disease (PD) is linked to impairments for recognizing emotional expressions, although the extent and nature of these communication deficits are uncertain. Here, we compared how adults with and without PD recognize dynamic expressions of emotion in three channels, involving lexical–semantic, prosody, and/or facial cues (each channel was investigated individually and in combination). Results indicated that while emotion recognition increased with channel availability in the PD group, patients performed significantly worse than healthy participants in all conditions. Difficulties processing dynamic emotional stimuli in PD could be linked to striatal dysfunction, which reduces efficient binding of sequential information in the disease.


Paulmann, S. & Pell, M.D. (2010). Contextual influences of emotional speech prosody on face processing: how much is enough? Cognitive, Affective, and Behavioral Neuroscience, 10 (2), 230-242.

The influence of emotional prosody on the evaluation of emotional facial expressions was investigated in an event-related brain potential (ERP) study using a priming paradigm, the facial affective decision task. Emotional prosodic fragments of short (200-msec) and medium (400-msec) duration were presented as primes, followed by an emotionally related or unrelated facial expression (or facial grimace, which does not resemble an emotion). Participants judged whether or not the facial expression represented an emotion. ERP results revealed an N400-like differentiation for emotionally related prime-target pairs when compared with unrelated prime-target pairs. Faces preceded by prosodic primes of medium length led to a normal priming effect (larger negativity for unrelated than for related prime-target pairs), but the reverse ERP pattern (larger negativity for related than for unrelated prime-target pairs) was observed for faces preceded by short prosodic primes. These results demonstrate that brief exposure to prosodic cues can establish a meaningful emotional context that influences related facial processing; however, this context does not always lead to a processing advantage when prosodic information is very short in duration.


Dimoska, A., McDonald, S., Pell, M.D., Tate, R. & James, C. (2010). Recognising vocal expressions of emotion in patients with social skills deficits following traumatic brain injury. Journal of the International Neuropsychological Society, 16, 369-382.

Perception of emotion in voice is impaired following traumatic brain injury (TBI). This study examined whether an inability to concurrently process semantic information (the "what") and emotional prosody (the "how") of spoken speech contributes to impaired recognition of emotional prosody and whether impairment is ameliorated when little or no semantic information is provided. Eighteen individuals with moderate-to-severe TBI showing social skills deficits during inpatient rehabilitation were compared with 18 demographically matched controls. Participants completed two discrimination tasks using spoken sentences that varied in the amount of semantic information: that is, (1) well-formed English, (2) a nonsense language, and (3) low-pass filtered speech producing "muffled" voices. Reducing semantic processing demands did not improve perception of emotional prosody. The TBI group were significantly less accurate than controls. Impairment was greater within the TBI group when accessing semantic memory to label the emotion of sentences, compared with simply making "same/different" judgments. Findings suggest an impairment of processing emotional prosody itself rather than semantic processing demands which leads to an over-reliance on the "what" rather than the "how" in conversational remarks. Emotional recognition accuracy was significantly related to the ability to inhibit prepotent responses, consistent with neuroanatomical research suggesting similar ventrofrontal systems subserve both functions.


Jaywant, A. & Pell, M.D. (2010). Listener impressions of speakers with Parkinson’s disease. Journal of the International Neuropsychological Society, 16, 49-57.

Parkinson’s disease (PD) has several negative effects on speech production and communication. However, few studies have looked at how speech patterns in PD contribute to linguistic and social impressions formed about PD patients from the perspective of listeners. In this study, discourse recordings elicited from nondemented PD speakers (n = 18) and healthy controls (n = 17) were presented to 30 listeners unaware of the speakers’ disease status. In separate conditions, listeners rated the discourse samples based on their impressions of the speaker or of the linguistic content. Acoustic measures of the speech samples were analyzed for comparison with listeners’ perceptual ratings. Results showed that although listeners rated the content of Parkinsonian discourse as linguistically appropriate (e.g., coherent, well-organized, easy to follow), the PD speakers were perceived as significantly less interested, less involved, less happy, and less friendly than healthy speakers. Negative social impressions demonstrated a relationship to changes in vocal intensity (loudness) and temporal characteristics (dysfluencies) of Parkinsonian speech. Our findings emphasize important psychosocial ramifications of PD that are likely to limit opportunities for communication and social interaction for those affected, because of the negative impressions drawn by listeners based on their speaking voice.



Pell, M.D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S.A. (2009). Factors in the recognition of vocally expressed emotions: A comparison of four languages. Journal of Phonetics, 37, 417-435.

To understand how language influences the vocal communication of emotion, we investigated how discrete emotions are recognized and acoustically differentiated in four language contexts—English, German, Hindi, and Arabic. Vocal expressions of six emotions (anger, disgust, fear, sadness, happiness, pleasant surprise) and neutral expressions were elicited from four native speakers of each language. Each speaker produced pseudo-utterances (“nonsense speech”) which resembled their native language to express each emotion type, and the recordings were judged for their perceived emotional meaning by a group of native listeners in each language condition. Emotion recognition and acoustic patterns were analyzed within and across languages. Although overall recognition rates varied by language, all emotions could be recognized strictly from vocal cues in each language at levels exceeding chance. Anger, sadness, and fear tended to be recognized most accurately irrespective of language. Acoustic and discriminant function analyses highlighted the importance of speaker fundamental frequency (i.e., relative pitch level and variability) for signalling vocal emotions in all languages. Our data emphasize that while emotional communication is governed by display rules and other social variables, vocal expressions of ‘basic’ emotion in speech exhibit modal tendencies in their acoustic and perceptual attributes which are largely unaffected by language or linguistic similarity.


Paulmann, S., Pell, M.D., & Kotz, S.A. (2009). Comparative processing of emotional prosody and semantics following basal ganglia infarcts: ERP evidence of selective impairments for disgust and fear. Brain Research, 1295, 159-169.

There is evidence from neuroimaging and clinical studies that functionally link the basal ganglia to emotional speech processes. However, in most previous studies, explicit tasks were administered. Thus, the underlying mechanisms substantiating emotional speech are not separated from possibly process-related task effects. Therefore, the current study tested emotional speech processing in an event-related potential (ERP) experiment using an implicit emotional processing task (probe verification). The interactive time course of emotional prosody in the context of emotional semantics was investigated using a cross-splicing method. As previously demonstrated, combined prosodic and semantic expectancy violations elicit N400-like negativities irrespective of emotional categories in healthy listeners. In contrast, basal ganglia patients show this negativity only for the emotions of happiness and anger, but not for fear or disgust. The current data serve as first evidence that lesions within the left basal ganglia affect the comparative online processing of fear and disgust prosody and semantics. Furthermore, the data imply that previously reported emotional speech recognition deficits in basal ganglia patients may be due to misaligned processing of emotional prosody and semantics.


Cheang, H.S. & Pell, M.D. (2009). Acoustic markers of sarcasm in Cantonese and English. Journal of the Acoustical Society of America, 126 (3), 1394-1405.

The goal of this study was to identify acoustic parameters associated with the expression of sarcasm by Cantonese speakers, and to compare the observed features to similar data on English Cheang, H. S. and Pell, M. D. (2008). Speech Commun. 50, 366-381. Six native Cantonese speakers produced utterances to express sarcasm, humorous irony, sincerity, and neutrality. Each utterance was analyzed to determine the mean fundamental frequency (F0), F0-range, mean amplitude, amplitude-range, speech rate, and harmonics-to-noise ratio (HNR) (to probe voice quality changes). Results showed that sarcastic utterances in Cantonese were produced with an elevated mean F0, and reductions in amplitude- and F0-range, which differentiated them most from sincere utterances. Sarcasm was also spoken with a slower speech rate and a higher HNR (i.e., less vocal noise) than the other attitudes in certain linguistic contexts. Direct Cantonese-English comparisons revealed one major distinction in the acoustic pattern for communicating sarcasm across the two languages: Cantonese speakers raised mean F0 to mark sarcasm, whereas English speakers lowered mean F0 in this context. These findings emphasize that prosody is instrumental for marking non-literal intentions in speech such as sarcasm in Cantonese as well as in other languages. However, the specific acoustic conventions for communicating sarcasm seem to vary among languages.


Monetta, L., Grindrod, C. & Pell, M.D. (2009). Irony comprehension and theory of mind deficits in patients with Parkinson's disease. Cortex, 45, 972-981. (Special Issue on "Parkinson's disease, Language, and Cognition").

Many individuals with Parkinson's disease (PD) are known to have difficulties in understanding pragmatic aspects of language. In the present study, a group of eleven non-demented PD patients and eleven healthy control (HC) participants were tested on their ability to interpret communicative intentions underlying verbal irony and lies, as well as on their ability to infer first- and second-order mental states (i.e., theory of mind). Following Winner et al. (1998), participants answered different types of questions about the events which unfolded in stories which ended in either an ironic statement or a lie. Results showed that PD patients were significantly less accurate than HC participants in assigning second-order beliefs during the story comprehension task, suggesting that the ability to make a second-order mental state attribution declines in PD. The PD patients were also less able to distinguish whether the final statement of a story should be interpreted as a joke or a lie, suggesting a failure in pragmatic interpretation abilities. The implications of frontal lobe dysfunction in PD as a source of difficulties with working memory, mental state attributions, and pragmatic language deficits are discussed in the context of these findings.


Pell, M.D., Monetta, L., Paulmann, S. & Kotz, S.A. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 33, 107-120.

Expressions of basic emotions (joy, sadness, anger, fear, disgust) can be recognized pan-culturally from the face and it is assumed that these emotions can be recognized from a speaker’s voice, regardless of an individual’s culture or linguistic ability. Here, we compared how monolingual speakers of Argentine Spanish recognize basic emotions from pseudo-utterances (“nonsense speech”) produced in their native language and in three foreign languages (English, German, Arabic). Results indicated that vocal expressions of basic emotions could be decoded in each language condition at accuracy levels exceeding chance, although Spanish listeners performed significantly better overall in their native language (“in-group advantage”). Our findings argue that the ability to understand vocally-expressed emotions in speech is partly independent of linguistic ability and involves universal principles, although this ability is also shaped by linguistic and cultural variables.



Pell, M.D. & Monetta, L. (2008). How Parkinson’s disease affects nonverbal communication and language processing. Language and Linguistics Compass, 2 (5), 739-759.

In addition to difficulties that affect movement, many adults with Parkinson's disease (PD) experience changes that negatively impact on receptive aspects of their communication. For example, some PD patients have difficulties processing non-verbal expressions (facial expressions, voice tone) and many are less sensitive to ‘non-literal’ or pragmatic meanings of language, at least under certain conditions. This chapter outlines how PD can affect the comprehension of language and non-verbal expressions and considers how these changes are related to concurrent alterations in cognition (e.g., executive functions, working memory) and motor signs associated with the disease. Our summary underscores that the progressive course of PD can interrupt a number of functional systems that support cognition and receptive language, and in different ways, leading to both primary and secondary impairments of the systems that support linguistic and non-verbal communication.


Monetta, L., Cheang, H.S. & Pell, M.D. (2008). Understanding speaker attitudes from prosody by adults with Parkinson's disease. Journal of Neuropsychology, 2, 415-430.

The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical ‘pseudo-utterances’ were presented to listener groups with and without PD in two separate rating tasks. Task 1 required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the polite/impolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).


Monetta,L., Grindrod, C.M., & Pell, M.D. (2008). Effects of working memory capacity on inference generation during story comprehension in adults with Parkinson’s disease. Journal of Neurolinguistics, 21, 400-417.

A group of non-demented adults with Parkinson's disease (PD) were studied to investigate how PD affects pragmatic-language processing, and, specifically, to test the hypothesis that the ability to draw inferences from discourse in PD is critically tied to the underlying working memory (WM) capacity of individual patients [Monetta, L., & Pell, M. D. (2007). Effects of verbal working memory deficits on metaphor comprehension in patients with Parkinson's disease. Brain and Language, 101, 80–89]. Thirteen PD patients and a matched group of 16 healthy control (HC) participants performed the Discourse Comprehension Test [Brookshire, R. H., & Nicholas, L. E. (1993). Discourse comprehension test. Tucson, AZ: Communication Skill Builders], a standardized test which evaluates the ability to generate inferences based on explicit or implied information relating to main ideas or details presented in short stories. Initial analyses revealed that the PD group as a whole was significantly less accurate than the HC group when comprehension questions pertained to implied as opposed to explicit information in the stories, consistent with previous findings [Murray, L. L., & Stout, J. C. (1999). Discourse comprehension in Huntington's and Parkinson's diseases. American Journal of Speech–Language Pathology, 8, 137–148]. However, subsequent analyses showed that only a subgroup of PD patients with WM deficits, and not PD patients with WM capacity within the control group range, were significantly impaired for drawing inferences (especially predictive inferences about implied details in the stories) when compared to the control group. These results build on a growing body of literature, which demonstrates that compromise of frontal–striatal systems and subsequent reductions in processing/WM capacity in PD are a major source of pragmatic-language deficits in many PD patients.


Paulmann, S., Pell, M.D., & Kotz, S.A. (2008). Functional contributions of the basal ganglia to emotional prosody: evidence from ERPs. Brain Research, 1217, 171-178.

The basal ganglia (BG) have been functionally linked to emotional processing [Pell, M.D., Leonard, C.L., 2003. Processing emotional tone form speech in Parkinson's Disease: a role for the basal ganglia. Cogn. Affec. Behav. Neurosci. 3, 275–288; Pell, M.D., 2006. Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang. 97 (2), 221–234]. However, few studies have tried to specify the precise role of the BG during emotional prosodic processing. Therefore, the current study examined deviance detection in healthy listeners and patients with left focal BG lesions during implicit emotional prosodic processing in an event-related brain potential (ERP)-experiment. In order to compare these ERP responses with explicit judgments of emotional prosody, the same participants were tested in a follow-up recognition task. As previously reported [Kotz, S.A., Paulmann, S., 2007. When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res. 1151, 107–118; Paulmann, S. & Kotz, S.A., 2008. An ERP investigation on the temporal dynamics of emotional prosody and emotional semantics in pseudo- and lexical sentence context. Brain Lang. 105, 59–69], deviance of prosodic expectancy elicits a right lateralized positive ERP component in healthy listeners. Here we report a similar positive ERP correlate in BG-patients and healthy controls. In contrast, BG-patients are significantly impaired in explicit recognition of emotional prosody when compared to healthy controls. The current data serve as first evidence that focal lesions in left BG do not necessarily affect implicit emotional prosodic processing but evaluative emotional prosodic processes as demonstrated in the recognition task. The results suggest that the BG may not play a mandatory role in implicit emotional prosodic processing. Rather, executive processes underlying the recognition task may be dysfunctional during emotional prosodic processing.


Pell, M.D., & Skorup, V. (2008). Implicit processing of emotional prosody in a foreign versus native language. Speech Communication, 50, 519-530.

To test ideas about the universality and time course of vocal emotion processing, 50 English listeners performed an emotional priming task to determine whether they implicitly recognize emotional meanings of prosody when exposed to a foreign language. Arabic pseudoutterances produced in a happy, sad, or neutral prosody acted as primes for a happy, sad, or false (i.e., non-emotional) face target and participants judged whether the facial expression represents an emotion. The prosody-face relationship (congruent, incongruent) and the prosody duration (600 or 1000 ms) were independently manipulated in the same experiment. Results indicated that English listeners automatically detect the emotional significance of prosody when expressed in a foreign language, although activation of emotional meanings in a foreign language may require increased exposure to prosodic information than when listening to the native language.


Cheang, H.S. & Pell, M.D. (2008). The sound of sarcasm. Speech Communication, 50, 366-381.

The present study was conducted to identify possible acoustic cues of sarcasm. Native English speakers produced a variety of simple utterances to convey four different attitudes: sarcasm, humour, sincerity, and neutrality. Following validation by a separate naïve group of native English speakers, the recorded speech was subjected to acoustic analyses for the following features: mean fundamental frequency (F0), F0 standard deviation, F0 range, mean amplitude, amplitude range, speech rate, harmonics-to-noise ratio (HNR, to probe for voice quality changes), and one-third octave spectral values (to probe resonance changes). The results of analyses indicated that sarcasm was reliably characterized by a number of prosodic cues, although one acoustic feature appeared particularly robust in sarcastic utterances: overall reductions in mean F0 relative to all other target attitudes. Sarcasm was also reliably distinguished from sincerity by overall reductions in HNR and in F0 standard deviation. In certain linguistic contexts, sarcasm could be differentiated from sincerity and humour through changes in resonance and reductions in both speech rate and F0 range. Results also suggested a role of language used by speakers in conveying sarcasm and sincerity. It was concluded that sarcasm in speech can be characterized by a specific pattern of prosodic cues in addition to textual cues, and that these acoustic characteristics can be influenced by language used by the speaker.


Paulmann, S., Pell, M.D. & Kotz, S.A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262-269.

To successfully infer a speaker’s emotional state, diverse sources of emotional information need to be decoded. The present study explored to what extent emotional speech recognition of ‘basic’ emotions (anger, disgust, fear, happiness, pleasant surprise, sadness) differs between different sex (male/female) and age (young/middle-aged) groups in a behavioural experiment. Participants were asked to identify the emotional prosody of a sentence as accurately as possible. As a secondary goal, the perceptual findings were examined in relation to acoustic properties of the sentences presented. Findings indicate that emotion recognition rates differ between the different categories tested and that these patterns varied significantly as a function of age, but not of sex.


Dara, C., Monetta, L. & Pell, M.D. (2008). Vocal emotion processing in Parkinson’s disease: reduced sensitivity to negative emotions. Brain Research, 1188, 100-111.

To document the impact of Parkinson's disease (PD) on communication and to further clarify the role of the basal ganglia in the processing of emotional speech prosody, this investigation compared how PD patients identify basic emotions from prosody and judge specific affective properties of the same vocal stimuli, such as valence or intensity. Sixteen non-demented adults with PD and 17 healthy control (HC) participants listened to semantically-anomalous pseudo-utterances spoken in seven emotional intonations (anger, disgust, fear, sadness, happiness, pleasant surprise, neutral) and two distinct levels of perceived emotional intensity (high, low). On three separate occasions, participants classified the emotional meaning of the prosody for each utterance (identification task), rated how positive or negative the stimulus sounded (valence rating task), or rated how intense the emotion was expressed by the speaker (intensity rating task). Results indicated that the PD group was significantly impaired relative to the HC group for categorizing emotional prosody and showed a reduced sensitivity to valence, but not intensity, attributes of emotional expressions conveying anger, disgust, and fear. The findings are discussed in light of the possible role of the basal ganglia in the processing of discrete emotions, particularly those associated with negative vigilance, and of how PD may impact on the sequential processing of prosodic expressions.



Berney, A., Panisset, M., Sadikot, A.F., Ptito, A., Dagher, A., Fraraccio, M., Savard, G., Pell, M.D. & Benkelfat, C. (2007). Mood stability during acute stimulator challenge in PD patients under long-term treatment with subthalamic DBS. Movement Disorders, 22(8), 1093-1096

Acute and chronic behavioral effects of subthalamic stimulation (STN-DBS) for Parkinson's disease (PD) are reported in the literature. As the technique is relatively new, few systematic studies on the behavioral effects in long-term treated patients are available. To further study the putative effects of STN-DBS on mood and emotional processing, 15 consecutive PD patients under STN-DBS for at least 1 year, were tested ON and OFF stimulation while on or off medication, with instruments sensitive to short-term changes in mood and in emotional discrimination. After acute changes in experimental conditions, mood core dimensions (depression, elation, anxiety) and emotion discrimination processing remained remarkably stable, in the face of significant motor changes. Acute stimulator challenge in long-term STN-DBS–treated PD patients does not appear to provoke clinically relevant mood effects.


Cheang, H.S. & Pell, M.D. (2007). An acoustic investigation of Parkinsonian speech in linguistic and emotional contexts. Journal of Neurolinguistics, 20, 221-241

The speech prosody of a group of patients in the early stages of Parkinson's disease (PD) was compared to that of a group of healthy age- and education-matched controls to quantify possible acoustic changes in speech production secondary to PD. Both groups produced standardized speech samples across a number of prosody conditions: phonemic stress, contrastive stress, and emotional prosody. The amplitude, fundamental frequency, and duration of all tokens were measured. PD speakers produced speech that was of lower amplitude than the tokens of healthy speakers in many conditions across all production tasks. Fundamental frequency distinguished the two speaker groups for contrastive stress and emotional prosody production, and duration differentiated the groups for phonemic stress production. It was concluded that motor impairments in PD lead to adverse and varied acoustic changes which affect a number of prosodic contrasts in speech and that these alterations appear to occur in earlier stages of disease progression than is often presumed by many investigators.


Monetta, L. & Pell, M.D. (2007). Effects of verbal working memory deficits on metaphor comprehension in patients with Parkinson's disease. Brain and Language, 101, 80-89.

The speech prosody of a group of patients in the early stages of Parkinson's disease (PD) was compared to that of a group of healthy age- and education-matched controls to quantify possible acoustic changes in speech production secondary to PD. Both groups produced standardized speech samples across a number of prosody conditions: phonemic stress, contrastive stress, and emotional prosody. The amplitude, fundamental frequency, and duration of all tokens were measured. PD speakers produced speech that was of lower amplitude than the tokens of healthy speakers in many conditions across all production tasks. Fundamental frequency distinguished the two speaker groups for contrastive stress and emotional prosody production, and duration differentiated the groups for phonemic stress production. It was concluded that motor impairments in PD lead to adverse and varied acoustic changes which affect a number of prosodic contrasts in speech and that these alterations appear to occur in earlier stages of disease progression than is often presumed by many investigators.


Pell, M.D. (2007). Reduced sensitivity to prosodic attitudes in adults with focal right hemisphere brain damage. Brain and Language, 101, 64-79.

Although there is a strong link between the right hemisphere and understanding emotional prosody in speech, there are few data on how the right hemisphere is implicated for understanding the emotive "attitudes" of a speaker from prosody. This report describes two experiments which compared how listeners with and without focal right hemisphere damage (RHD) rate speaker attitudes of "confidence" and "politeness" which are signalled in large part by prosodic features of an utterance. The RHD listeners displayed abnormal sensitivity to both the expressed confidence and politeness of speakers, underscoring a major role for the right hemisphere in the processing of emotions and speaker attitudes from prosody, although the source of these deficits may sometimes vary.



Pell, M. D., Cheang, H. S., & Leonard, C. L. (2006). The impact of Parkinson’s disease on vocal-prosodic communication from the perspective of listeners. Brain and Language , 97 (2), 123-134.

An expressive disturbance of speech prosody has long been associated with idiopathic Parkinson’s disease (PD), but little is known about the impact of dysprosody on vocal-prosodic communication from the perspective of listeners. Recordings of healthy adults (n = 12) and adults with mild to moderate PD (n = 21) were elicited in four speech contexts in which prosody serves a primary function in linguistic or emotive communication (phonemic stress, contrastive stress, sentence mode, and emotional prosody). Twenty independent listeners naive to the disease status of individual speakers then judged the intended meanings conveyed by prosody for tokens recorded in each condition. Findings indicated that PD speakers were less successful at communicating stress distinctions, especially words produced with contrastive stress, which were identifiable to listeners. Listeners were also significantly less able to detect intended emotional qualities of Parkinsonian speech, especially for anger and disgust. Emotional expressions that were correctly recognized by listeners were consistently rated as less intense for the PD group. Utterances produced by PD speakers were frequently characterized as sounding sad or devoid of emotion entirely (neutral). Results argue that motor limitations on the vocal apparatus in PD produce serious and early negative repercussions on communication through prosody, which diminish the social-linguistic competence of Parkinsonian adults as judged by listeners.


Pell, M.D. (2006). Cerebral mechanisms for understanding emotional prosody in speech. Brain and Language, 96 (2), 221-234.

Hemispheric contributions to the processing of emotional speech prosody were investigated by comparing adults with a focal lesion involving the right (n = 9) or left (n = 11) hemisphere and adults without brain damage (n = 12). Participants listened to semantically anomalous utterances in three conditions (discrimination, identification, and rating) which assessed their recognition of five prosodic emotions under the influence of different task- and response-selection demands. Findings revealed that right- and left-hemispheric lesions were associated with impaired comprehension of prosody, although possibly for distinct reasons: right-hemisphere compromise produced a more pervasive insensitivity to emotive features of prosodic stimuli, whereas left-hemisphere damage yielded greater difficulties interpreting prosodic representations as a code embedded with language content.


Cheang, H.S. & Pell, M.D. (2006). A study of humour and communicative intention following right hemisphere stroke. Journal of Clinical Linguistics and Phonetics, 20 (6), 447-462

This research provides further data regarding non‐literal language comprehension following right hemisphere damage (RHD). To assess the impact of RHD on the processing of non‐literal language, ten participants presenting with RHD and ten matched healthy control participants were administered tasks tapping humour appreciation and pragmatic interpretation of non‐literal language. Although the RHD participants exhibited a relatively intact ability to interpret humour from jokes, their use of pragmatic knowledge about interpersonal relationships in discourse was significantly reduced, leading to abnormalities in their understanding of communicative intentions (CI). Results imply that explicitly detailing CI in discourse facilitates RHD participants' comprehension of non‐literal language.



Pell, M.D. (2005). Prosody-face interactions in emotional processing as revealed by the facial affect decision task. Journal of Nonverbal Behavior, 29 (4), 193-215.

Previous research employing the facial affect decision task (FADT) indicates that when listeners are exposed to semantically anomalous utterances produced in different emotional tones (prosody), the emotional meaning of the prosody primes decisions about an emotionally congruent rather than incongruent facial expression (Pell, M. D., Journal of Nonverbal Behavior, 29, 45-73). This study undertook further development of the FADT by investigating the approximate timecourse of prosody-face interactions in nonverbal emotion processing. Participants executed facial affect decisions about happy and sad face targets after listening to utterance fragments produced in an emotionally related, unrelated, or neutral prosody, cut to 300, 600, or 1000 ms in duration. Results underscored that prosodic information enduring at least 600 ms was necessary to presumably activate shared emotion knowledge responsible for prosody-face congruity effects.


Pell, M.D. & Leonard, C.L. (2005). Facial expression decoding in early Parkinson’s disease.Cognitive Brain Research, 23 (2-3), 327-340.

The ability to derive emotional and non-emotional information from unfamiliar, static faces was evaluated in 21 adults with idiopathic Parkinson's disease (PD) and 21 healthy control subjects. Participants' sensitivity to emotional expressions was comprehensively assessed in tasks of discrimination, identification, and rating of five basic emotions: happiness, (pleasant) surprise, anger, disgust, and sadness. Subjects also discriminated and identified faces according to underlying phonemic ("facial speech") cues and completed a neuropsychological test battery. Results uncovered limited evidence that the processing of emotional faces differed between the two groups in our various conditions, adding to recent arguments that these skills are frequently intact in non-demented adults with PD [R. Adolphs, R. Schul, D. Tranel, Intact recognition of facial emotion in Parkinson's disease, Neuropsychology 12 (1998) 253-258]. Patients could also accurately interpret facial speech cues and discriminate the identity of unfamiliar faces in a normal manner. There were some indications that basal ganglia pathology in PD contributed to selective difficulties recognizing facial expressions of disgust, consistent with a growing literature on this topic. Collectively, findings argue that abnormalities for face processing are not a consistent or generalized feature of medicated adults with mild-moderate PD, prompting discussion of issues that may be contributing to heterogeneity within this literature. Our results imply a more limited role for the basal ganglia in the processing of emotion from static faces relative to speech prosody, for which the same PD patients exhibited pronounced deficits in a parallel set of tasks [M.D. Pell, C. Leonard, Processing emotional tone from speech in Parkinson's disease: a role for the basal ganglia, Cogn. Affect. Behav. Neurosci. 3 (2003) 275-288]. These diverging patterns allow for the possibility that basal ganglia mechanisms are more engaged by temporally-encoded social information derived from cue sequences over time.


Pell, M.D. (2005). Nonverbal emotion priming: evidence from the ‘facial affect decision task’. Journal of Nonverbal Behavior, 29 (1), 45-73.

Affective associations between a speakerrsquos voice (emotional prosody) and a facial expression were investigated using a new on-line procedure, the Facial Affect Decision Task (FADT). Faces depicting one of four lsquobasicrsquo emotions were paired with utterances conveying an emotionally-related or unrelated prosody, followed by a yes/no judgement of the face as a lsquotruersquo exemplar of emotion. Results established that prosodic characteristics facilitate the accuracy and speed of decisions about an emotionally congruent target face, supplying empirical support for the idea that information about discrete emotions is shared across major nonverbal channels. The FADT represents a promising tool for future on-line studies of nonverbal processing in both healthy and disordered individuals.



Pell, M.D. & Leonard, C.L. (2003). Processing emotional tone from speech in Parkinson’s disease: a role for the basal ganglia. Cognitive, Affective, & Behavioral Neuroscience, 3 (4), 275-288.

In this study, individuals with Parkinson’s disease were tested as a model for basal ganglia dysfunction to infer how these structures contribute to the processing of emotional speech tone (emotional prosody). Nondemented individuals with and without Parkinson’s disease (n = 21/group) completed neuropsychological tests and tasks that required them to process the meaning of emotional prosody in various ways (discrimination, identification, emotional feature rating). Individuals with basal ganglia disease exhibited abnormally reduced sensitivity to the emotional significance of prosody in a range of contexts, a deficit that could not be attributed to changes in mood, emotional-symbolic processing, or estimated frontal lobe cognitive resource limitations in most conditions. On the basis of these and broader findings in the literature, it is argued that the basal ganglia provide a critical mechanism for reinforcing the behavioral significance of prosodic patterns and other temporal representations derived from cue sequences (Lieberman, 2000), facilitating cortical elaboration of these events.



Pell, M.D. (2002). Evaluation of nonverbal emotion in face and voice: some preliminary findings on a new battery of tests. Brain and Cognition, 48, 499-504.

This report describes some preliminary attributes of stimuli developed for future evaluation of nonverbal emotion in neurological populations with acquired communication impairments. Facial and vocal exemplars of six target emotions were elicited from four male and four female encoders and then prejudged by 10 young decoders to establish the category membership of each item at an acceptable consensus level. Representative stimuli were then presented to 16 additional decoders to gather indices of how category membership and encoder gender influenced recognition accuracy of emotional meanings in each nonverbal channel. Initial findings pointed to greater facility in recognizing target emotions from facial than vocal stimuli overall and revealed significant accuracy differences among the six emotions in both the vocal and facial channels. The gender of the encoder portraying emotional expressions was also a significant factor in how well decoders recognized specific emotions (disgust, neutral), but only in the facial condition.



Baum, S.R., Pell, M.D., Leonard, C. & Gordon, J. (2001). Using prosody to resolve temporary syntactic ambiguities in speech production: acoustic data on brain-damaged speakers. Clinical Linguistics and Phonetics, 15, 441-456.

Left hemisphere brain lesions resulting in aphasia frequently produce impairments in speech production, including the ability to appropriately transmit linguistic distinctions through sentence prosody. The present investigation gathered preliminary data on how focal brain lesions influence one important aspect of prosody that has been largely ignored in the literature - the production of sentence-level syntactic distinctions that rely on prosodic alterations to disambiguate alternate meanings of a sentence. Utterances characterizing three distinct types of syntactic ambiguities (scope, prepositional phrase attachment, and noun phrase/sentential complement attachment) were elicited from individuals with unilateral left hemisphere damage (LHD), right hemisphere damage (RHD), and adults without brain pathology (NC). A written vignette preceding each ambiguous sentence target biased how the utterance was interpreted and produced. Recorded productions were analysed acoustically to examine parameters of duration (word length, pause) and fundamental frequency (F0) for key constituents specific to each of the ambiguity conditions. Results of the duration analyses demonstrated a preservation of many of the temporal cues to syntactic boundaries in both LHD and RHD patients. The two interpretations of sentences containing 'scope' and 'prepositional phrase attachment' ambiguities were differentiated by all speakers (including LHD and RHD patients) through the production of at least one critical temporal parameter that was consistent across the three groups. Temporal markers of sentences containing 'noun phrase/sentential complement attachment' ambiguities were not found to be encoded consistently within any speaker group and may be less amenable to experimental manipulation in this manner. Results of F0 analyses were far less revealing in characterizing different syntactic assignments of the stimuli, and coupled with other findings in the literature, may carry less weight than temporal parameters in this process. Together, results indicate that the ability to disambiguate sentences using prosodic variables is relatively spared subsequent to both LHD and RHD, although it is noteworthy that LHD patients did exhibit deficits regulating other temporal properties of the utterances, consistent with left hemisphere control of speech timing.


Pell, M.D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. Journal of the Acoustical Society of America, 109 (4), 1668-1680.

Preliminary data were collected on how emotional qualities of the voice (sad, happy, angry) influence the acoustic underpinnings of neutral sentences varying in location of intra-sentential focus (initial, final, no) and utterance "modality" (statement, question). Short (six syllable) and long (ten syllable) utterances exhibiting varying combinations of emotion, focus, and modality characteristics were analyzed for eight elderly speakers following administration of a controlled elicitation paradigm (story completion) and a speaker evaluation procedure. Duration and fundamental frequency (f0) parameters of recordings were scrutinized for "keyword" vowels within each token and for whole utterances. Results generally re-affirmed past accounts of how duration and f0 are encoded on key content words to mark linguistic focus in affectively neutral statements and questions for English. Acoustic data on three "global" parameters of the stimuli (speech rate, mean f0, f0 range) were also largely supportive of previous descriptions of how happy, sad, angry, and neutral utterances are differentiated in the speech signal. Important interactions between emotional and linguistic properties of the utterances emerged which were predominantly (although not exclusively) tied to the modulation of f0; speakers were notably constrained in conditions which required them to manipulate f0 parameters to express emotional and nonemotional intentions conjointly. Sentence length also had a meaningful impact on some of the measures gathered.


Leonard, C.L., Baum, S.R., & Pell, M.D. (2001). The effect of compressed speech on the ability of right-hemisphere-damaged patients to use context. Cortex, 37 (3), 327-344

The ability of RHD patients to use context under conditions of increased processing demands was examined. Subjects monitored for words in auditorily presented sentences of three context types-normal, semantically anomalous, and random, at three rates of speech normal, 70% compressed (Experiment 1) and 60% compressed (Experiment 2). Effects of semantics and syntax were found for the RHD and normal groups under the normal rate of speech condition. Using compressed rates of speech, the effect of syntax disappeared, but the effect of semantics remained. Importantly, and contrary to expectations, the RHD group was similar to normals in continuing to demonstrate an effect of semantic context under conditions of increased processing demands. Results are discussed relative to contemporary theories of laterality, based on studies with normals, that suggest that the involvement of the left versus right hemisphere in context use may depend upon the type of contextual information being processed.



Pell, M.D. (1999). The temporal organization of affective and non-affective speech in patients with right-hemisphere infarcts. Cortex, 35 (4), 455-477.

To evaluate the right hemisphere's role in encoding speech prosody, an acoustic investigation of timing characteristics was undertaken in speakers with and without focal right-hemisphere damage (RHD) following cerebrovascular accident. Utterances varying along different prosodic dimensions (emphasis, emotion) were elicited from each speaker using a story completion paradigm, and measures of utterance rate and vowel duration were computed. Results demonstrated parallelism in how RHD and healthy individuals encoded the temporal correlates of emphasis in most experimental conditions. Differences in how RHD speakers employed temporal cues to specify some aspects of prosodic meaning (especially emotional content) were observed and corresponded to a reduction in the perceptibility of prosodic meanings when conveyed by the RHD speakers. Findings indicate that RHD individuals are most disturbed when expressing prosodic representations that vary in a graded (rather than categorical) manner in the speech signal (Blonder, Pickering, Heath et al., 1995; Pell, 1999a).


Pell, M.D. (1999). Fundamental frequency encoding of linguistic and emotional prosody by right hemisphere-damaged speakers. Brain and Language, 69 (2), 161-192.

To illuminate the nature of the right hemisphere's involvement in expressive prosodic functions, a story completion task was administered to matched groups of right hemisphere-damaged (RHD) and nonneurological control subjects. Utterances which simultaneously specified three prosodic distinctions (emphatic stress, sentence modality, emotional tone) were elicited from each subject group and then subjected to acoustic analysis to examine various fundamental frequency (F(0)) attributes of the stimuli. Results indicated that RHD speakers tended to produce F(0) patterns that resembled normal productions in overall shape, but with significantly less F(0) variation. The RHD patients were also less reliable than normal speakers at transmitting emphasis or emotional contrasts when judged from the listener's perspective. Examination of the results across a wide variety of stimulus types pointed to a deficit in successfully implementing continuous aspects of F(0) patterns following right hemisphere insult.



Pell, M.D. (1998). Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia, 36(8), 701-715.

The perception of prosodic distinctions by adults with unilateral right- (RHD) and left-hemisphere (LHD) damage and subjects without brain injury was assessed through six tasks that varied both functional (i.e. linguistic\emotional) and structural (i.e. acoustic) attributes of a common set of base stimuli. Three tasks explored the subjects’ ability to perceive local prosodic markers associated with emphatic stress (Focus Perception condition) and three tasks examined the comprehension of emotional-prosodic meanings by the same listeners (Emotion Perception condition). Within each condition, an initial task measured the subjects’ ability to recognize each ‘‘type’’ of prosody when all potential acoustic features (but no semantic features) signalled the target response (Baseline). Two additional tasks investigated the extent to which each group’s performance on the Baseline task was influenced by duration (D-Neutral) or fundamental frequency (F-Neutral) parameters of the stimuli within each condition. Results revealed that both RHD and LHD patients were impaired, relative to healthy control subjects, in interpreting the emotional meaning of prosodic contours, but that only LHD patients displayed subnormal capacity to perceive linguistic (emphatic) specifications via prosodic cues. The performance of the RHD and LHD patients was also selectively disturbed when certain acoustic properties of the stimuli were manipulated, suggesting that both functional and structural attributes of prosodic patterns may be determinants of prosody lateralization.



Baum, S.R., Pell, M.D., Leonard, C.L. & Gordon, J.K. (1997). The ability of right- and left-hemisphere-damaged individuals to produce and interpret prosodic cues marking phrasal boundaries. Language and Speech, 40(4), 313-330.

Two experiments were conducted with the purpose of investigating the ability of right-and left-hemisphere-damaged individuals to produce and perceive the acoustic correlates to phrase boundaries. In the production experiment, the utterance pink and black and green was elicited in three different conditions corresponding to different arrangements of colored squares. Acoustic analyses revealed that both left-and right-hemisphere-damaged patients exhibited fewer of the expected acoustic patterns in their productions than did normal control subjects. The reduction in acoustic cues to phrase boundaries in the utterances of both patient groups was perceptually salient to three trained listeners. The perception experiment demonstrated a significant impairment in the ability of both left-hemisphere-damaged and right-hemisphere-damaged individuals to perceive phrasal groupings. Results are discussed in relation to current hypotheses concerning the cerebral lateralization of speech prosody.


Pell, M.D. & Baum, S.R. (1997). Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody. Brain and Language, 57(2), 195-214.

Stimuli from two previously presented comprehension tasks of affective and linguistic prosody (Pell & Baum, 1997) were analyzed acoustically and subjected to several discriminant function analyses, following Van Lancker and Sidtis (1992). An analysis of the errors made on these tasks by left-hemisphere-damaged (LHD) and right-hemisphere-damaged (RHD) subjects examined whether each clinical group relied on specific (and potentially different) acoustic features in comprehending prosodic stimuli (Van Lancker & Sidtis, 1992). Analyses also indicated whether the brain-damaged patients tested in Pell and Baum (1997) exhibited perceptual impairments in the processing of intonation. Acoustic analyses of the utterances reaffirmed the importance of F0 cues in signaling affective and linguistic prosody. Analyses of subjects' affective misclassifications did not suggest that LHD and RHD patients were biased by different sets of the acoustic features to prosody in judging their meaning, in contrast to Van Lancker and Sidtis (1992). However, qualitative differences were noted in the ability of LHD and RHD patients to identifylinguisticprosody, indicating that LHD subjects may be specifically impaired in decoding linguistically defined categorical features of prosodic patterns.


Pell, M.D. & Baum, S.R. (1997). The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain and Language, 57(1), 80-99.

Receptive tasks of linguistic and affective prosody were administered to 9 right-hemisphere-damaged (RHD), 10 left-hemisphere-damaged (LHD), and 10 age-matched control (NC) subjects. Two tasks measured subjects' ability to discriminate utterances based solely on prosodic cues, and six tasks required subjects to identify linguistic or affective intonational meanings. Identification tasks manipulated the degree to which the auditory stimuli were structured linguistically, presenting speech-filtered, nonsensical, and semantically well-formed utterances in different tasks. Neither patient group was impaired relative to normals in discriminating prosodic patterns or recognizing affective tone conveyed suprasegmentally, suggesting that neither the LHD nor the RHD patients displayed a receptive disturbance for emotional prosody. The LHD group, however, was differentially impaired on linguistic rather than emotional tasks and performed significantly worse than the NC group on linguistic tasks even when semantic information biased the target response.


Baum, S.R. & Pell, M.D. (1997). Production of affective and linguistic prosody by brain-damaged patients. Aphasiology, 11, 177-198.

To test a number of hypotheses concerning the functional lateralization of speech prosody, the ability of unilaterally right-hemisphere-damaged (RHD), unilaterally left-hemisphere-damaged (LHD), and age-matched control subjects (NC) to produce linguistic and affective prosodic contrasts at the sentence level was assessed via acoustic analysis. Multiple aspects of suprasegmental processing were explored, including a manipulation of the type of elicitation task employed (repetition vs reading) and the amount of linguistic structure provided in experimental stimuli (stimuli were either speech-filtered, nonsensical, or semantically well formed). In general, the results demonstrated that both RHD and LHD patients were able to appropriately utilize the acoustic parameters examined (duration, fundamental frequency (F 0), amplitude) to differentiate both linguistic and affective sentence types in a manner comparable to NC speakers. Some irregularities in the global modulation of F 0 and amplitude by RHD speakers were noted, however. Overall, the present findings do not provide support for previous claims that the right hemisphere is specifically engaged in the production of affective prosody. Alternative models of prosodic processing are noted.



Pell, M.D. (1996). On the receptive prosodic loss in Parkinson's disease. Cortex, 3, 693-704.

To comprehensively explore how the processing of linguistic and affective prosodic cues is affected by idiopathic Parkinson's disease (PD), a battery of receptive tests was presented to eleven PD patients without intellectual or language impairment and eleven control subjects (NC) matched for age, gender, and educational attainment. Receptive abilities for both low-level (discrimination) and higher-level (identification) prosodic processing were explored; moreover, the identification of prosodic feature was tested at both the lexical level (phonemic stress perception) and over the sentential domain (prosodic pattern identification). The results obtained demonstrated a general reduction in the ability of the PD patients to identify the linguistic- and affective-prosodic meaning of utterances relative to NC subjects, without a concurrent loss in the ability to perceive phonemic stress contrasts or discriminate prosodic patterns. However, the qualitative pattern of the PD and NC groups' performance across the various identification conditions tested was remarkably uniform, indicating that only quantitative differences in comprehension abilities may have characterized the two groups. It is hypothesized that the basal ganglia form part of a functional network dedicated to prosodic processing (Blonder et al., 1989) and that the processes required to map prosodic features onto their communicative representations at the sentence level are rendered less efficient by the degenerative course of PD.





Back to top