Publications

Articles évalués par les pairs

— 2026 —

Chen, W., Pell, M. D., & Jiang, X. (2026). Does speech prosody shape social perception equally for AI and human voices? A 16-attribute rating study. Computers in Human Behavior Reports, 101142–101142.

AI-generated speech can now replicate humanlike prosodic patterns, yet whether these cues shape social perception equivalently to human voices remains unknown. We used voice cloning to closely match speaker identity and transfer prosodic style across human and AI voices. We had 40 native Mandarin Chinese speakers rate 320 utterances (half human-produced) in two prosodic styles, confident (greater F0 variability, and faster speech rate) and doubtful (higher mean F0), across 16 perceptual attributes spanning acoustic properties and social impressions. Listeners rated human and AI voices differently: human voices received higher ratings on such attributes as humanlikeness, animateness, and emotional richness, whereas AI voices scored higher on speaking rate and nasality. Confident prosody enhanced ratings relative to doubtful prosody across all 15 attributes for both sources, with AI voices showing greater gains across 11 attributes. Principal component analysis revealed two core perceptual dimensions: social appeal (capturing attractiveness, animateness, and pleasantness) and vocal expressiveness (capturing speech rate, loudness, and confidence-related features), with AI voices scoring substantially lower on social appeal. Meanwhile, vocal expressiveness strongly predicted social appeal for human voices, but only weakly for AI voices, and AI-human differences in social appeal widened with increasing expressiveness levels. As participants were not informed of the voice source, our findings suggest that in-group/out-group categorization of human vs. AI voice source implicitly constrains how listeners process short-term prosodic cues in speech. We propose that commercialized AI voices should remain perceptibly identifiable as AI-generated, and that highly humanlike prosody should be used with caution in interpersonal contexts.

Article

Chen, W., Pell, M. D., & Jiang, X. (2026). Human and AI voice identities evoke shared neural signatures during speaker recognition across changes in speech content and prosody. Neuropsychologia, 229, 109493–109493.

Both biologically-produced human voices and algorithmically-generated AI speech manifest speaker identity. Critically, prosodic variations modulate the acoustic dimensions (e.g., fundamental frequency) that also shape individual speaker identity representations. So far, it remains unclear whether listeners process speaker identities in human and AI voices through neurologically equivalent mechanisms, nor how prosodic cues might influence these cognitive processes. We examined event-related potentials during old/new speaker discrimination after name-based identity learning, and further analyzed correctly recognized old speakers, comparing trials where prosody matched vs. mismatched between learning and testing. For old/new discrimination, multivariate pattern analysis (MVPA) revealed three significant late windows (662-1498 ms) with Pz as the primary contributor for AI voices, yet no clusters for human voices. Univariate analyses revealed that human voices showed earlier widespread discrimination (N250: 200-280 ms), while both voice types converged on Pz as the strongest contributor based on effect size rankings for late old/new effects (400-800 ms). These old/new effects emerged across completely different speech content between learning and testing, extending content-independent parietal ERP effects beyond syllabic stimuli. For speaker-specific prosodic expectation effects in the 500-900 ms window, unexpected prosody elicited late positivity for human voices compared to the prosody used during learning, whereas AI voices elicited late negativity. The late positivity resembles P600 components observed for communicative style expectancy violations, while the late negativity likely reflects effortful reprocessing of prosodic violations within atypical synthetic signals, analogous to accented speech processing. These findings advance understanding of voice identity processing and have implications for AI voices in human-computer interaction.

Article

Rothermich, K., Su, Y., Giles, R., & Pell, M. D. (2026). Social perception of English irony: a comparison of L1 and L2 speakers. Applied Linguistics.

Irony plays an important role in social communication, but its perceived politeness and appropriateness can vary across linguistic backgrounds. This study investigated how first-language (L1) English speakers and advanced second-language (L2) English speakers (Chinese L1) evaluate ironic (sarcastic, teasing) versus literal statements in audio-visual conversations. Participants watched videos and rated them on politeness, appropriateness, and their own likelihood of using them. Stimuli differed by communicative intent (positive vs. negative) and delivery style (literal vs. ironic). Both groups judged literal and positive statements as more polite and appropriate than ironic and negative ones. However, L1 and L2 speakers differed in how willing they were to use irony themselves, with L1 speakers preferring literal over ironic remarks more strongly than L2 speakers. Sarcastic remarks were judged less polite than literal negative statements, contrary to predictions of the Tinge Hypothesis that irony softens criticism. Our findings show that paralinguistic cues shape irony evaluation and that L2 speakers interpret irony similarly to L1 speakers but are more cautious about using it themselves. These results advance the understanding of intercultural pragmatics and L2 sociopragmatic development.

Article

Domínguez-Arriola, M. E., & Pell, M. D. (2026). Not worth my time! Understanding factors that make speech socially engaging. Journal of Experimental Psychology Human Perception & Performance, 52(5), 672–692.

speakers present themselves shape the subjective value of social anecdotes, potentially contributing to perceived interaction quality. (PsycInfo Database Record (c) 2026 APA, all rights reserved).

Article

Chen, W., Pell, M. D., & Jiang, X. (2026). Prosodic cues strengthen human-AI voice boundaries: Listeners do not easily perceive human speakers and AI clones as the same person. Computers in Human Behavior: Artificial Humans, 7, 100261.

Previous studies concluded that listeners struggle to discriminate AI from human voices, but these studies used monotone-like speech and did not examine prosodic expressiveness, a key advantage of human over AI speakers. This study explores whether prosodic expressiveness facilitates human-AI voice discrimination. We recorded human prosodic speech with confident and doubtful expressions, trained AI models to replicate these prosodic patterns, had AI models generate new sentences, and then had human speakers produce equivalent prosodic expressions for the same sentences. In Experiment 1, we had 48 listeners rate humanlikeness and perceived confidence in 11,808 audio samples, finding that AI speech was consistently rated as less humanlike regardless of prosody. We selected 768 audios (AI × human, confident × doubtful prosody) for Experiment 2, where 80 listeners completed an identity discrimination task, telling whether two sounds were from the same speaker. Bayesian modeling results revealed near-ceiling performance for human-human/AI-AI pairs, with inconsistent prosodies decreasing accuracy by ∼7%, while listeners do not easily categorize AI and human as sharing the same identity (∼54% accuracy when prosody matches, dropping to ∼36% when inconsistent). We observed accuracy–reaction time synchronization; in human–AI/AI–human pairs only, however, listeners relied less on distance cues when the two voices’ identities were distant beyond a certain threshold. Overall, we found that listeners perceive AI speech as lower in humanlikeness, and prosodic variation further promotes rejecting AI and human voices as sharing the same identity, indicating that human acceptance of AI voices as equivalent to human voices is limited.

Article  ·  Code

Pell, M. D., Cui, H., Mori, Y., & Jiang, X. (2026). Speak or shout? Nonverbal vocalizations promote rapid detection of emotions in vocal communication. PLoS ONE, 21(1), e0327529–e0327529.

Human vocal expressions of emotion can be expressed nonverbally, through vocalizations such as shouts or laughter, or speakers can embed emotional meanings in language by modifying their tone of voice ("prosody"). Is there evidence that nonverbal expressions promote "better" (i.e., more accurate, faster) recognition of emotions than speech, and what is the impact of language experience? Our study investigated these questions using a cross-cultural gating paradigm, in which Chinese and Arab listeners (n = 25/group) judged the emotion communicated by acoustic events that varied in duration (200 milliseconds to the full expression) and form (vocalizations or prosody expressed in listeners' native, second or foreign language). Accuracy was higher for vocalizations overall, but listeners were markedly more efficient to form stable categorical representations of the speaker's emotion from vocalizations (M = 417ms) than native prosody (M = 765ms). Language experience enhanced recognition of emotional prosody expressed by native/ingroup speakers for some listeners (Chinese) but not all (Arab), emphasizing the dynamic interplay of socio-cultural factors and stimulus quality on prosody recognition which occurs over a more sustained time window. Our data show that vocalizations are functionally suited to build robust, rapid impressions of a speaker's emotion state unconstrained by the listener's linguistic cultural background.

Article

— 2025 —

Gao, L., Pell, M. D., Peng, Z., & Jiang, X. (2025). Perceiving female physical attractiveness and expressive traits from body features and body motion. BMC Psychology, 13(1), 1206–1206.

The perception of female physical attractiveness is known to be predicted by body features(e.g. BMI). However, the role of body motion (e.g. postures) and the relative contribution of each type of cues are unclear. Little research reported how body cues modulate the perception of female expressive traits (e.g. warmth). We photographed and filmed 15 female posers and recorded their anthropometric data. In picture stimuli, each poser adopted neutral, instructed attractive, or spontaneous attractive and unattractive postures. In video stimuli, posers introduced a place in neutral or passionate manner. Fifty-four perceivers watched these pictures and silent videos and rated their physical attractiveness and feminine expressive traits on 7-point scales. Lasso regression and proportion of variance explained analyses revealed that Body features demonstrated stronger predictive power for physical attractiveness than body motions across both picture and video stimuli. However, for feminine traits, body motions showed greater predictive validity in videos, whereas neither body features nor body motions effectively predicted feminine traits in static images. Different roles of body features and body motion play for perceiving different levels of personal characteristics in social perception. Perception of expressive traits appears to rely more substantially on body motions, whereas the judgment of physical attractiveness depends more fundamentally on body features.

Article

Lam, P. C. H., Cui, H., & Pell, M. D. (2025). The influence of speaker accent on the neurocognitive processing of politeness. Brain Research, 1865, 149897–149897.

Does speaking with a foreign accent alter how listeners respond to verbal requests? Cooperative outcomes depend on multiple social factors, such as the politeness of the speaker who makes a request (e.g., their tone of voice). However, little is known about how indexical features derived from the speaker's voice influence neurocognitive operations during politeness communication. In an event-related potential (ERP) study, 31 participants listened to requests ("Please lend me a nickel") that varied in prosodic politeness (polite/rude), imposition level (low/high cost to perform an action), and accent (native/foreign English speaker). In separate rating tasks, participants made a social inference (speaker friendliness) or pragmatic inference (likelihood of request compliance) about each request based on available speech cues. ERPs time-locked to request onset revealed an interaction of speaker accent and prosodic (im)politeness on the P200 component (180-260 ms) and the subsequent late positivity (450-700 ms). Findings pointed to rapid perceptual differentiation of the speaker's linguistic status and stance (P200), after which listeners attended more deeply to attitudinal cues expressed by native ingroup speakers. ERPs evoked by the sentence-final imposition word showed that speaker characteristics influenced how the "cost" of requests was later evaluated, hampering N400 semantic operations (300-500 ms) when speakers acted rudely or had a foreign accent-conditions less frequently associated with cooperative interactions. Selective uptake of particular social cues relevant for drawing social versus pragmatic inferences about requests was also noted. Our data identify a time course and brain mechanisms that integrate speaker and message during politeness communication.

Article

Domínguez-Arriola, M. E., Bazzi, L., Mauchand, M., Foucart, A., & Pell, M. D. (2025). Does criticism in a foreign accent hurt less?. Language Cognition and Neuroscience, 40(10), 1370–1389.

Evaluative statements are routine in interpersonal communication but may evoke different responses depending on the speaker’s identity. Here, thirty participants listened to compliments and criticisms spoken in native or foreign accents and rated the speaker's friendliness as their electroencephalogram was recorded. Event-related potentials (ERPs) were examined for (1) vocal speech cues time-locked to statement onset and (2) emotive semantic attributes at the sentence-final word. Criticisms from native speakers were rated as less friendly than from foreign speakers, whereas compliments did not differ. ERPs revealed listeners rapidly used acoustic information to differentiate speaker identity and evaluative attitude (N100, P200), then selectively monitored vocal attitude of native speakers in the Late Positive Potential (LPP) time window. When the evaluative word was heard, native accents contextually modulated early semantic processing (N400). Words of criticism increased the LPP irrespective of accent. Our data showcase unique neurocognitive and behavioural effects of speaker accent on communication.

Article

Mauchand, M., & Pell, M. D. (2025). The sound of complaints. Frontiers in Communication, 10.

Complaining is a social act in which a speaker often verbally conveys feelings of suffering to gain empathy from listeners. The present study investigated the acoustic profile of complaints to identify which prosodic features are used in this context and to explore differences in their cultural expression in two variants of French. A stimulus set composed of 336 complaints and 336 prosodically neutral utterances produced by two cultural groups, French and Québécois (French-Canadian), was analyzed along 15 acoustic parameters. Utterances were also judged by listeners to determine whether complaints were perceptually associated with particular emotional characteristics. Relative to neutral statements, complaints displayed increases in fundamental frequency (mean, variability, and range), loudness, and high-frequency energy, and several rhythmic modulations. Complaints were also characterized by systematic changes in parameters related to voice quality and increased vocal control (decreased shimmer, increased harmonics-to-noise ratio), which could exemplify the speaker’s strategic use of emotive cues. Perceptually, complaining voices were most associated with sadness, anger, and surprise. Complaints produced by French and Québécois speakers demonstrated shared central tendencies but also differed both acoustically and perceptually. Our results provide new insights into the acoustic and perceptual profiles of emotive “complaining” speech patterns meant to elicit empathy in social interactions.

Article

— 2024 —

Glick, A., Jones, C. L., Martignetti, L., Blanchette, L., Tova, T., Henderson, A. M., Pell, M. D., & Li‐Jessen, N. Y. K. (2024). An integrated empirical and computational study to decipher help-seeking behaviors and vocal stigma. Communications Medicine, 4(1), 228–228.

Professional voice users often experience stigma associated with voice disorders and are reluctant to seek medical help. This study deployed empirical and computational tools to (1) quantify the experience of vocal stigma and help-seeking behaviors in performers; and (2) predict their modulations with peer influences in social networks. Experience of vocal stigma and information-motivation-behavioral (IMB) skills were prospectively profiled using online surveys from a total of 403 Canadians (200 singers and actors and 203 controls). Data were used to formulate an agent-based network model of social interactions on vocal stigma (self-stigma and social-stigma) and help-seeking behaviors. Network analysis was performed to evaluate the effect of social network structure on the flow of IMB among virtual agents. Larger social networks are more likely to contribute to an increase in vocal stigma. For small social networks, total stigma is reduced with higher total IMB but not much so for large networks. For agents with high social-stigma and risk for voice disorder, their vocal stigma is resistant to large changes in IMB ( > 2 standard deviations). Agents with extreme IMB and stigma values are likely to polarize their networks faster in larger social groups. We integrated empirical surveys and computational techniques to contextualize vocal stigma and IMB beyond theory and to quantify the interaction among stigma, health-seeking behavior and influence of social interactions. This work establishes an effective, predictable experimental platform to provide scientific evidence in developing interventions to reduce health stigma in voice disorders and other medical conditions. Voice professionals such as singers and actors can experience stigma if they have a voice disorder. This stigma can result from their personal experience and knowledge (internalized) or be based on input from their peers, employment, and healthcare providers (externalized). To understand how negative vocal stigma spreads, we surveyed the stigma experience of voice professionals and developed computational models. We find that people tend to have more polarized stigma experiences when they are in larger social groups. Vocal stigma is not changed by a person’s knowledge, beliefs, and tendency to seek help. Our method could be used to study other stigmatized health conditions. Our research could also be used to reduce stigma and promote more equitable health care for vocal professionals with a voice disorder. Glick et al. investigate the stigma experience and help-seeking behavior in professional singers and actors using de novo data and social simulation. They find that vocal performers experience greater discrimination against their vocal injury with simulation data also predicting that vocal stigma could be worsened with larger social groups.

Article

Melo, L. E. H., & Pell, M. D. (2024). Acoustic indicators of voice quality in the context of social support. The Journal of the Acoustical Society of America, 155(3_Supplement), A308–A308.

Is social support communicated throught the subtle, yet powerful, acoustic variations in speech? This study attempted to answer this question by testing whether acoustic parameters vary when expressing social support. Participants underwent an experiment in which they watched video testimonials of a woman describing either a neutral subject or a sensitive, emotionally-charged experience. After this, participants provided voice messages to the person appearing in the testimony. Employing the openSMILE toolkit, we extracted from these speech responses the Geneva Minimalistic Acoustic Parameter Set (GeMAPS), a set of emotion-related acoustic features. Our investigation reveals an acoustic profile characteristic of supportive speech, distinguished by changes in the Alpha ratio, spectral slope, and the Hammarberg index — parameters representing the high-frequency content and spectral balance. These acoustic differences not only help to differentiate supportive utterances but also characterize its voice quality, thereby enhancing the emotional richness of this affective stance. Our research findings have potential applications in therapeutic and communication settings and open avenues for further exploration in speech science.

Article

Larrouy-Maestri, P., Poeppel, D., & Pell, M. D. (2024). The Sound of Emotional Prosody: Nearly 3 Decades of Research and Future Directions. Perspectives on Psychological Science, 20(4), 623–638.

Emotional voices attract considerable attention. A search on any browser using "emotional prosody" as a key phrase leads to more than a million entries. Such interest is evident in the scientific literature as well; readers are reminded in the introductory paragraphs of countless articles of the great importance of prosody and that listeners easily infer the emotional state of speakers through acoustic information. However, despite decades of research on this topic and important achievements, the mapping between acoustics and emotional states is still unclear. In this article, we chart the rich literature on emotional prosody for both newcomers to the field and researchers seeking updates. We also summarize problems revealed by a sample of the literature of the last decades and propose concrete research directions for addressing them, ultimately to satisfy the need for more mechanistic knowledge of emotional prosody.

Article

— 2023 —

Mauchand, M., Armony, J. L., & Pell, M. D. (2023). The vocal side of empathy: neural correlates of pain perception in spoken complaints. Social Cognitive and Affective Neuroscience, 19(1).

In the extensive neuroimaging literature on empathy for pain, few studies have investigated how this phenomenon may relate to everyday social situations such as spoken interactions. The present study used functional Magnetic Resonance Imaging (fMRI) to assess how complaints, as vocal expressions of pain, are empathically processed by listeners and how these empathic responses may vary based on speakers' vocal expression and cultural identity. Twenty-four French participants listened to short utterances describing a painful event, which were either produced in a neutral-sounding or complaining voice by both in-group (French) and out-group (French Canadian) speakers. Results suggest that the perception of suffering from a complaining voice increased activity in the emotional voice areas, composed of voice-sensitive temporal regions interacting with prefrontal cortices and the amygdala. The Salience and Theory of Mind networks, associated with affective and cognitive aspects of empathy, also showed prosody-related activity and specifically correlated with behavioral evaluations of suffering by listeners. Complaints produced by in- vs out-group speakers elicited sensorimotor and default mode activity, respectively, suggesting accent-based changes in empathic perspective. These results, while reaffirming the role of key networks in tasks involving empathy, highlight the importance of vocal expression information and social categorization processes when perceiving another's suffering during social interactions.

Article

Mauchand, M., & Pell, M. D. (2023). Complain like you mean it! How prosody conveys suffering even about innocuous events. Brain and Language, 244, 105305–105305.

When complaining, speakers can use their voice to convey a feeling of pain, even when describing innocuous events. Rapid detection of emotive and identity features of the voice may constrain how the semantic content of complaints is processed, as indexed by N400 and P600 effects evoked by the final, pain-related word. Twenty-six participants listened to statements describing painful and innocuous events expressed in a neutral or complaining voice, produced by ingroup and outgroup accented speakers. Participants evaluated how hurt the speaker felt under EEG monitoring. Principal Component Analysis of Event-Related Potentials from the final word onset demonstrated N400 and P600 increases when complainers described innocuous vs. painful events in a neutral voice, but these effects were altered when utterances were expressed in a complaining voice. Independent of prosody, N400 amplitudes increased for complaints spoken in outgroup vs. ingroup accents. Results demonstrate that prosody and accent constrain the processing of spoken complaints as proposed in a parallel-constraint-satisfaction model.

Article

— 2022 —

Zhang, S., & Pell, M. D. (2022). Cultural differences in vocal expression analysis: Effects of task, language, and stimulus-related factors. PLoS ONE, 17(10), e0275915–e0275915.

Cultural context shapes the way that emotions are expressed and socially interpreted. Building on previous research looking at cultural differences in judgements of facial expressions, we examined how listeners recognize speech-embedded emotional expressions and make inferences about a speaker's feelings in relation to their vocal display. Canadian and Chinese participants categorized vocal expressions of emotions (anger, fear, happiness, sadness) expressed at different intensity levels in three languages (English, Mandarin, Hindi). In two additional tasks, participants rated the intensity of each emotional expression and the intensity of the speaker's feelings from the same stimuli. Each group was more accurate at recognizing emotions produced in their native language (in-group advantage). However, Canadian and Chinese participants both judged the speaker's feelings to be equivalent or more intense than their actual display (especially for highly aroused, negative emotions), suggesting that similar inference rules were applied to vocal expressions by the two cultures in this task. Our results provide new insights on how people categorize and interpret speech-embedded vocal expressions versus facial expressions and what cultural factors are at play.

Article

Mauchand, M., & Pell, M. D. (2022). Listen to my feelings! How prosody and accent drive the empathic relevance of complaining speech. Neuropsychologia, 175, 108356–108356.

Interpersonal communication often involves sharing our feelings with others; complaining, for example, aims to elicit empathy in listeners by vocally expressing a speaker's suffering. Despite the growing neuroscientific interest in the phenomenon of empathy, few have investigated how it is elicited in real time by vocal signals (prosody), and how this might be affected by interpersonal factors, such as a speaker's cultural background (based on their accent). To investigate the neural processes at play when hearing spoken complaints, twenty-six French participants listened to complaining and neutral utterances produced by in-group French and out-group Québécois (i.e., French-Canadian) speakers. Participants rated how hurt the speaker felt while their cerebral activity was monitored with electroencephalography (EEG). Principal Component Analysis of Event-Related Potentials (ERPs) taken at utterance onset showed culture-dependent time courses of emotive prosody processing. The high motivational relevance of ingroup complaints increased the P200 response compared to all other utterance types; in contrast, outgroup complaints selectively elicited an early posterior negativity in the same time window, followed by an increased N400 (due to ongoing effort to derive affective meaning from outgroup voices). Ingroup neutral utterances evoked a late negativity which may reflect re-analysis of emotively less salient, but culturally relevant ingroup speech. Results highlight the time-course of neurocognitive responses that contribute to emotive speech processing for complaints, establishing the critical role of prosody as well as social-relational factors (i.e., cultural identity) on how listeners are likely to "empathize" with a speaker.

Article

Caballero, J. A., Auclair‐Ouellet, N., Phillips, N. A., & Pell, M. D. (2022). Social decision-making in Parkinson’s disease. Journal of Clinical and Experimental Neuropsychology, 44(4), 302–315.

INTRODUCTION: Parkinson's Disease (PD) commonly affects cognition and communicative functions, including the ability to perceive socially meaningful cues from nonverbal behavior and spoken language (e.g., a speaker's tone of voice). However, we know little about how people with PD use social information to make decisions in daily interactions (e.g., decisions to trust another person) and whether this ability rests on intact cognitive functions and executive/decision-making abilities in nonsocial domains. METHOD: Non-demented adults with and without PD were presented utterances that conveyed differences in speaker confidence or politeness based on the way that speakers formulated their statement and their tone of voice. Participants had to use these speech-related cues to make trust-related decisions about interaction partners while playing the Trust Game. Explicit measures of social perception, nonsocial decision-making, and related cognitive abilities were collected. RESULTS: Individuals with PD displayed significant differences from control participants in social decision-making; for example, they showed greater trust in game partners whose voice sounded confident and who explicitly stated that they would cooperate with the participant. The PD patients displayed relative intact social perception (speaker confidence or politeness ratings) and were unimpaired on a nonsocial decision-making task (the Dice game). No obvious relationship emerged between measures of social perception, social decision-making, or cognitive functioning in the PD sample. CONCLUSIONS: contexts in PD individuals with relatively preserved cognition with minimal changes in social perception. Researchers and practitioners interested in how PD affects social perception and cognition should include assessments that emulate social interactions, as non-interactive tasks may fail to detect the full impact of the disease on those affected.

Article

Pell, M. D., Sethi, S., Rigoulot, S., Rothermich, K., Liu, P., & Jiang, X. (2022). Emotional voices modulate perception and predictions about an upcoming face. Cortex, 149, 148–164.

When we hear an emotional voice, does this alter how the brain perceives and evaluates a subsequent face? Here, we tested this question by comparing event-related potentials evoked by angry, sad, and happy faces following vocal expressions which varied in form (speech-embedded emotions, non-linguistic vocalizations) and emotional relationship (congruent, incongruent). Participants judged whether face targets were true exemplars of emotion (facial affect decision). Prototypicality decisions were more accurate and faster for congruent vs. incongruent faces and for targets that displayed happiness. Principal component analysis identified vocal context effects on faces in three distinct temporal factors: a posterior P200 (150-250 ms), associated with evaluating face typicality; a slow frontal negativity (200-750 ms) evoked by angry faces, reflecting enhanced attention to threatening targets; and the Late Positive Potential (LPP, 450-1000 ms), reflecting sustained contextual evaluation of intrinsic face meaning (with independent LPP responses in posterior and prefrontal cortex). Incongruent faces and faces primed by speech (compared to vocalizations) tended to increase demands on face perception at stages of structure-building (P200) and meaning integration (posterior LPP). The frontal LPP spatially overlapped with the earlier frontal negativity response; these components were functionally linked to expectancy-based processes directed towards the incoming face, governed by the form of a preceding vocal expression (especially for anger). Our results showcase differences in how vocalizations and speech-embedded emotion expressions modulate cortical operations for predicting (prefrontal) versus integrating (posterior) face meaning in light of contextual details.

Article

Rothermich, K., Ahn, S., Dannhauer, M., & Pell, M. D. (2022). Social appropriateness perception of dynamic interactions. Social Neuroscience, 17(1), 37–57.

The current study explored the judgment of communicative appropriateness while processing a dialogue between two individuals. All stimuli were presented as audio-visual as well as audio-only vignettes and 24 young adults reported their social impression (appropriateness) of literal, blunt, sarcastic, and teasing statements. On average, teasing statements were rated as more appropriate when processing audio-visual statements compared to the audio-only version of a stimuli, while sarcastic statements were judged as less appropriate with additional visual information. These results indicate a rejection of the Tinge Hypothesis for audio-visual vignettes while confirming it for the reduced, audio-only counterparts. We also analyzed time-frequency EEG data of four frequency bands that have been related to language processing: alpha, beta, theta and low gamma. We found desynchronization in the alpha band literal versus nonliteral items, confirming the assumption that the alpha band reflects stimulus complexity. The analysis also revealed a power increase in the theta, beta and low gamma band, especially when comparing blunt and nonliteral statements in the audio-only condition. The time-frequency results corroborate the prominent role of the alpha and theta bands in language processing and offer new insights into the neural correlates of communicative appropriateness and social aspects of speech perception.

Article

— 2021 —

Mauchand, M., & Pell, M. D. (2021). French or Québécois? How speaker accents shape implicit and explicit intergroup attitudes among francophones in Montréal. Canadian Journal of Behavioural Science/Revue canadienne des sciences du comportement, 54(1), 1–8.

The perception of accented speech creates a number of biases, but experimental evidence about their nature and the factors at play in these processes is scarce. The present study focused on francophone populations in Montreal, assessing implicit and explicit accent-based attitudes between Québécois (French Canadian) and European French groups. Twenty-seven Québécois and 31 French participants were administered a modified Implicit Association Test using Québécois- and French-accented speech samples and Stereotype Content questionnaires about the perceived warmth and competence of each group. For the implicit measure, French participants showed a significant in-group bias, whereas Québécois participants did not show any preference for either group. In contrast, explicit attitudes were mostly congruent: Québécois people were rated as slightly more warm than competent, whereas French people were rated much less warm than competent by all participants. No correlation was found between implicit and explicit measures. The asymmetry in implicit biases was explained by accent exposure, Québécois people being more familiar with the French accent than French people are with the Québécois accent. Explicit attitudes were partly explained by prestige differences between the two accents but could also have been influenced by moral beliefs held by the French participants as recent immigrants in Quebec. These results emphasize that implicit and explicit biases driven by speaker accents arise from distinct but interrelated mechanisms, involving both global and local situational factors. Our findings have implications for understanding the dynamics of intergroup relationships in Montreal and many other settings where regional accent mixing is the norm.

Article

Caballero, J. A., Mauchand, M., Jiang, X., & Pell, M. D. (2021). Cortical processing of speaker politeness: Tracking the dynamic effects of voice tone and politeness markers. Social Neuroscience, 16(4), 423–438.

Information in the tone of voice alters social impressions and underlying brain activity as listeners evaluate the interpersonal relevance of utterances. Here, we presented requests that expressed politeness distinctions through the voice (polite/rude) and explicit linguistic markers (half of the requests began with Please). Thirty participants performed a social perception task (rating friendliness) while their electroencephalogram was recorded. Behaviorally, vocal politeness strategies had a much stronger influence on the perceived friendliness than the linguistic marker. Event-related potentials revealed rapid effects of (im)polite voices on cortical activity prior to ~300 ms; P200 amplitudes increased for polite versus rude voices, suggesting that the speaker’s polite stance was registered as more salient in our task. At later stages, politeness distinctions encoded by the speaker’s voice and their use of Please interacted, modulating activity in the N400 (300–500 ms) and late positivity (600–800 ms) time windows. Patterns of results suggest that initial attention deployment to politeness cues is rapidly influenced by the motivational significance of a speaker’s voice. At later stages, processes for integrating vocal and lexical information resulted in increased cognitive effort to reevaluate utterances with ambiguous/contradictory cues. The potential influence of social anxiety on the P200 effect is also discussed.

Article

Liu, P., Rigoulot, S., Jiang, X., Zhang, S., & Pell, M. D. (2021). Unattended Emotional Prosody Affects Visual Processing of Facial Expressions in Mandarin-Speaking Chinese: A Comparison With English-Speaking Canadians. Journal of Cross-Cultural Psychology, 52(3), 275–294.

Emotional cues from different modalities have to be integrated during communication, a process that can be shaped by an individual's cultural background. We explored this issue in 25 Chinese participants by examining how listening to emotional prosody in Mandarin influenced participants' gazes at emotional faces in a modified visual search task. We also conducted a cross-cultural comparison between data of this study and that of our previous work in English-speaking Canadians using analogous methodology. In both studies, eye movements were recorded as participants scanned an array of four faces portraying fear, anger, happy, and neutral expressions, while passively listening to a pseudo-utterance expressing one of the four emotions (Mandarin utterance in this study; English utterance in our previous study). The frequency and duration of fixations to each face were analyzed during 5 seconds after the onset of faces, both during the presence of the speech (early time window) and after the utterance ended (late time window). During the late window, Chinese participants looked more frequently and longer at faces conveying congruent emotions as the speech, consistent with findings from English-speaking Canadians. Cross-cultural comparison further showed that Chinese, but not Canadians, looked more frequently and longer at angry faces, which may signal potential conflicts and social threats. We hypothesize that the socio-cultural norms related to harmony maintenance in the Eastern culture promoted Chinese participants' heightened sensitivity to, and deeper processing of, angry cues, highlighting culture-specific patterns in how individuals scan their social environment during emotion processing.

Article

Mauchand, M., & Pell, M. D. (2021). Emotivity in the Voice: Prosodic, Lexical, and Cultural Appraisal of Complaining Speech. Frontiers in Psychology, 11, 619222–619222.

Emotive speech is a social act in which a speaker displays emotional signals with a specific intention; in the case of third-party complaints, this intention is to elicit empathy in the listener. The present study assessed how the emotivity of complaints was perceived in various conditions. Participants listened to short statements describing painful or neutral situations, spoken with a complaining or neutral prosody, and evaluated how complaining the speaker sounded. In addition to manipulating features of the message, social-affiliative factors which could influence complaint perception were varied by adopting a cross-cultural design: participants were either Québécois (French Canadian) or French and listened to utterances expressed by both cultural groups. The presence of a complaining tone of voice had the largest effect on participant evaluations, while the nature of statements had a significant, but smaller influence. Marginal effects of culture on explicit evaluation of complaints were found. A multiple mediation analysis suggested that mean fundamental frequency was the main prosodic signal that participants relied on to detect complaints, though most of the prosody effect could not be linearly explained by acoustic parameters. These results highlight a tacit agreement between speaker and listener: what characterizes a complaint is how it is said (i.e., the tone of voice), more than what it is about or who produces it. More generally, the study emphasizes the central importance of prosody in expressive speech acts such as complaints, which are designed to strengthen social bonds and supportive responses in interactive behavior. This intentional and interpersonal aspect in the communication of emotions needs to be further considered in research on affect and communication.

Article

Mauchand, M., Caballero, J. A., Jiang, X., & Pell, M. D. (2021). Immediate online use of prosody reveals the ironic intentions of a speaker: neurophysiological evidence. Cognitive Affective & Behavioral Neuroscience, 21(1), 74–92.

In social interactions, speakers often use their tone of voice ("prosody") to communicate their interpersonal stance to pragmatically mark an ironic intention (e.g., sarcasm). The neurocognitive effects of prosody as listeners process ironic statements in real time are still poorly understood. In this study, 30 participants judged the friendliness of literal and ironic criticisms and compliments in the absence of context while their electrical brain activity was recorded. Event-related potentials reflecting the uptake of prosodic information were tracked at two time points in the utterance. Prosody robustly modulated P200 and late positivity amplitudes from utterance onset. These early neural responses registered both the speaker's stance (positive/negative) and their intention (literal/ironic). At a later timepoint (You are such a great/horrible cook), P200, N400, and P600 amplitudes were all greater when the critical word valence was congruent with the speaker's vocal stance, suggesting that irony was contextually facilitated by early effects from prosody. Our results exemplify that rapid uptake of salient prosodic features allows listeners to make online predictions about the speaker's ironic intent. This process can constrain their representation of an utterance to uncover nonliteral meanings without violating contextual expectations held about the speaker, as described by parallel-constraint satisfaction models.

Article

Pell, M. D., & Kotz, S. A. (2021). Comment: The Next Frontier: Prosody Research Gets Interpersonal. Emotion Review, 13(1), 51–56.

Neurocognitive models (e.g., Schirmer & Kotz, 2006) have helped to characterize how listeners incrementally derive meaning from vocal expressions of emotion in spoken language, what neural mechanisms are involved at different processing stages, and their relative time course. But how can these insights be applied to communicative situations in which prosody serves a predominantly interpersonal function? This comment examines recent data highlighting the dynamic interplay of prosody and language, when vocal attributes serve the sociopragmatic goals of the speaker or reveal interpersonal information that listeners use to construct a mental representation of what is being communicated. Our comment serves as a beacon to researchers interested in how the neurocognitive system “makes sense” of socioemotive aspects of prosody.

Article

— 2020 —

Caballero, J. A., & Pell, M. D. (2020). Implicit effects of speaker accents and vocally-expressed confidence on decisions to trust. Decision, 7(4), 314–331.

People often evaluate speakers with nonstandard accents as being less competent or trustworthy, which is often attributed to in-group favoritism. However, speakers can also modulate social impressions in the listener through their vocal expression (e.g., by speaking in a confident vs. a doubtful tone of voice). Here, we addressed how both accents and vocally-expressed confidence affect social outcomes in an interaction setting using the Trust Game, which operationalizes interpersonal trust using a monetary exchange situation. In a first study, 30 English Canadians interacted with partners speaking English with a Canadian, Australian, or foreign (French) accent. Speakers with each accent vocally expressed themselves in different ways (confident, doubtful, or neutral voice). Results show that trust decisions were significantly modulated by a speaker’s accent (fewer tokens were given to foreign-accented speakers) and by vocally-expressed confidence (less tokens were given to doubtful-sounding speakers). Using the same paradigm, a second study then tested whether manipulating the social identity of the speaker-listener led to similar trust decisions in participants who spoke English as a foreign language (EFL; 60 native speakers of French or Spanish). Again, EFL participants trusted partners who spoke in a doubtful manner and those with a foreign accent less, regardless of the participants’ linguistic background. Taken together, results suggest that in social-interactive settings, listeners implicitly use different sources of vocal cues to derive social impressions and to guide trust-related decisions, effects not solely driven by shared group membership. The influence of voice information on trust decisions was very similar for native and nonnative listeners.

Article

Rigoulot, S., Jiang, X., Vergis, N., & Pell, M. D. (2020). Neurophysiological correlates of sexually evocative speech. Biological Psychology, 154, 107909–107909.

Speakers modulate their voice (prosody) to communicate non-literal meanings, such as sexual innuendo (She inspected his package this morning, where "package" could refer to a man's penis). Here, we analyzed event-related potentials to illuminate how listeners use prosody to interpret sexual innuendo and what neurocognitive processes are involved. Participants listened to third-party statements with literal or 'sexual' interpretations, uttered in an unmarked or sexually evocative tone. Analyses revealed: 1) rapid neural differentiation of neutral vs. sexual prosody from utterance onset; (2) N400-like response differentiating contextually constrained vs. unconstrained utterances following the critical word (reflecting integration of prosody and word meaning); and (3) a selective increased negativity response to sexual innuendo around 600 ms after the critical word. Findings show that the brain quickly integrates prosodic and lexical-semantic information to form an impression of what the speaker is communicating, triggering a unique response to sexual innuendos, consistent with their high social relevance.

Article

Vergis, N., Jiang, X., & Pell, M. D. (2020). Neural responses to interpersonal requests: Effects of imposition and vocally-expressed stance. Brain Research, 1740, 146855–146855.

The way that speakers communicate their stance towards the listener is often vital for understanding the interpersonal relevance of speech acts, such as basic requests. To establish how interpersonal dimensions of an utterance affect neurocognitive processing, we compared event-related potentials elicited by requests that linguistically varied in how much they imposed on listeners (e.g., Lend me a nickel vs. hundred) and in the speaker's vocally-expressed stance towards the listener (polite or rude tone of voice). From utterance onset, effects of vocal stance were robustly differentiated by an early anterior positivity (P200) which increased for rude versus polite voices. At the utterance-final noun that marked the 'cost' of the request (nickel vs. hundred), there was an increased negativity between 300 and 500 ms in response to high-imposition requests accompanied by rude stance compared to the rest of the conditions. This N400 effect was followed by interactions of stance and imposition that continued to inform several effects in the late positivity time window (500-800 ms post-onset of the critical noun), some of which correlated significantly with prosody-related changes in the P200 response from utterance onset. Results point to rapid neural differentiation of voice-related information conveying stance (around 200 ms post-onset of speech) and exemplify the interplay of different sources of interpersonal meaning (stance, imposition) as listeners evaluate social implications of a request. Data show that representations of speaker meaning are actively shaped by vocal and verbal cues that encode interpersonal features of an utterance, promoting attempts to reanalyze and infer the pragmatic significance of speech acts in the 500-800 ms time window.

Article

— 2019 —

Vergis, N., & Pell, M. D. (2019). Factors in the perception of speaker politeness: the effect of linguistic structure, imposition and prosody. Journal of Politeness Research, 16(1), 45–84.

Although linguistic politeness has been studied and theorized about extensively, the role of prosody in the perception of (im)polite attitudes has been somewhat neglected. In the present study, we used experimental methods to investigate the interaction of linguistic form, imposition, and prosody in the perception of (im)polite requests. A written task established a baseline for the level of politeness associated with certain linguistic structures. Then stimuli were recorded in polite and rude prosodic conditions and in a perceptual experiment they were judged for politeness. Results revealed that, although both linguistic structure and prosody had a significant effect on politeness ratings, the effect of prosody was much more robust. In fact, rude prosody led in some cases to the neutralization of (extra)linguistic distinctions. The important contribution of prosody to (im)politeness inferences was also revealed by a comparison of the written and auditory tasks. These findings have important implications for models of (im)politeness and more generally for theories of affective speech. Implications for the generation of Particularized Conversational Implicatures (PCIs) of (im)politeness are also discussed.

Article

Mori, Y., & Pell, M. D. (2019). The Look of (Un)confidence: Visual Markers for Inferring Speaker Confidence in Speech. Frontiers in Communication, 4.

Evidence suggests that observers can accurately perceive a speaker’s static confidence level, related to their personality and social status, by only assessing their visual cues. However, less is known about the visual cues that speakers produce to signal their transient confidence level in the content of their speech. Moreover, it is unclear what visual cues observers use to accurately perceive a speaker’s confidence level. Observers are hypothesized to use visual cues in their social evaluations based on the cue’s level of perceptual salience and/or their beliefs about the cues that speakers with a given mental state produce. We elicited high and low levels of confidence in the speech content by having a group of speakers answer general knowledge questions ranging in difficulty while their face and upper body were video recorded. A group of observers watched muted videos of these recordings to rate the speaker’s confidence and report the face/body area(s) they used to assess the speaker’s confidence. Observers accurately perceived a speaker’s confidence level relative to the speakers’ subjective confidence, and broadly differentiated speakers as having low compared to high confidence by using speakers’ eyes, facial expressions, and head movements. Our results argue that observers use a speaker’s facial region to implicitly decode a speaker’s transient confidence level in a situation of low-stakes social evaluation, although the use of these cues differs across speakers. The effect of situational factors on speakers’ visual cue production and observers’ utilisation of these visual cues are discussed, with implications for improving how observers in real world contexts assess a speaker’s confidence in their speech content.

Article

Giles, R., Rothermich, K., & Pell, M. D. (2019). Differences in the Evaluation of Prosocial Lies: A Cross-Cultural Study of Canadian, Chinese, and German Adults. Frontiers in Communication, 4.

In daily life, humans often tell lies to make another person feel better about themselves, or to be polite or socially appropriate in situations when telling the blunt truth would be perceived as inappropriate. Prosocial lies are a form of nonliteral communication used cross-culturally, but how they are evaluated depends on socio-moral values and communication strategies. We examined how prosocial lies are evaluated by Canadian, Chinese, and German adults. Participants watched videos and rated politeness, appropriateness, and predicted frequency of use of prosocial lies and blunt truths. A two-way intention x culture interaction was observed for appropriateness and predicted frequency of use. These results suggest that the evaluation of prosocial lies is influenced by an interplay of intercultural communication strategies depending on cultural group membership.

Article

Jiang, X., Gossack‐Keenan, K., & Pell, M. D. (2019). To believe or not to believe? How voice and accent information in speech alter listener impressions of trust. Quarterly Journal of Experimental Psychology, 73(1), 55–79.

Our decision to believe what another person says can be influenced by vocally expressed confidence in speech and by whether the speaker–listener are members of the same social group. The dynamic effects of these two information sources on neurocognitive processes that promote believability impressions from vocal cues are unclear. Here, English Canadian listeners were presented personal statements ( She has access to the building) produced in a confident or doubtful voice by speakers of their own dialect (in-group) or speakers from two different “out-groups” (regional or foreign-accented English). Participants rated how believable the speaker is for each statement and event-related potentials (ERPs) were analysed from utterance onset. Believability decisions were modulated by both the speaker’s vocal confidence level and their perceived in-group status. For in-group speakers, ERP effects revealed an early differentiation of vocally expressed confidence (i.e., N100, P200), highlighting the motivational significance of doubtful voices for drawing believability inferences. These early effects on vocal confidence perception were qualitatively different or absent when speakers had an accent; evaluating out-group voices was associated with increased demands on contextual integration and re-analysis of a non-native representation of believability (i.e., increased N400, late negativity response). Accent intelligibility and experience with particular out-group accents each influenced how vocal confidence was processed for out-group speakers. The N100 amplitude was sensitive to out-group attitudes and predicted actual believability decisions for certain out-group speakers. We propose a neurocognitive model in which vocal identity information (social categorization) dynamically influences how vocal expressions are decoded and used to derive social inferences during person perception.

Article

Mauchand, M., Vergis, N., & Pell, M. D. (2019). Irony, Prosody, and Social Impressions of Affective Stance. Discourse Processes, 57(2), 141–157.

In spoken discourse, understanding irony requires the apprehension of subtle cues, such as the speaker’s tone of voice (prosody), which often reveal the speaker’s affective stance toward the listener in the context of the utterance. To shed light on the interplay of linguistic content and prosody on impressions of spoken criticisms and compliments (both literal and ironic), 40 participants rated the friendliness of the speaker in three separate conditions of attentional focus (No focus, Prosody focus, and Content focus). When the linguistic content was positive (“You are such an awesome driver!”), the perceived critical or friendly stance of the speaker was influenced predominantly by prosody. However, when the linguistic content was negative (“You are such a lousy driver!”), the speaker was always perceived as less friendly, even for ironic compliments that were meant to be teasing (i.e., positive stance). Our results highlight important asymmetries in how listeners use prosody and attend to different speech-related channels to form impressions of interpersonal stance for ironic criticisms (e.g., sarcasm) versus ironic compliments (e.g., teasing).

Article

— 2018 —

Pell, M. D., Vergis, N., Caballero, J. A., Mauchand, M., & Jiang, X. (2018). Prosody as a window into speaker attitudes and interpersonal stance. The Journal of the Acoustical Society of America, 144(3_Supplement), 1840–1840.

New research is exploring ways that prosody fulfils different social-pragmatic functions in spoken language by revealing the mental or affective state of the speaker, thereby contributing to an understanding of speaker’s meaning. Prosody is often pivotal in signaling speaker attitudes or stance in the interpersonal context of the speaker-hearer; in contrast to the vocal communication of emotions, the significance of prosodic cues serving an emotive or interpersonal function is often dependent on the type of speech act being made and other contextual-situational parameters. Here, we discuss recent acoustic-perceptual studies in our lab demonstrating how prosody marks the interpersonal stance of a speaker and how this information is used by listeners to uncover social intentions that may be non-literal or covert. Focusing on how prosody operates in the communication of politeness, ironic attitudes, and sincerity, our data add to research on the acoustic-perceptual characteristics of prosody in these interpersonal contexts. Future implications for how acoustic cues underlying “prosodic attitudes” affect the interpretive process during on-line speech processing and how they influence behavioral outcomes are considered.

Article

Jiang, X., Sanford, R., & Pell, M. D. (2018). Neural architecture underlying person perception from in-group and out-group voices. NeuroImage, 181, 582–597.

In spoken language, verbal cues (what we say) and vocal cues (how we say it) contribute to person perception, the process for interpreting information and making inferences about other people. When someone has an accent, forming impressions from the speaker's voice may be influenced by social categorization processes (i.e., activating stereotypical traits of members of a perceived ‘out‐group’) and by processes which differentiate the speaker based on their individual attributes (e.g., registering the vocal confidence level of the speaker in order to make a trust decision). The neural systems for using vocal cues that refer to the speaker's identity and to qualities of their vocal expression to generate inferences about others are not known. Here, we used functional magnetic resonance imaging (fMRI) to investigate how speaker categorization influences brain activity as Canadian‐English listeners judged whether they believe statements produced by in‐group (native) and out‐group (regional, foreign) speakers. Each statement was expressed in a confident, doubtful, and neutral tone of voice. In‐group speakers were perceived as more believable than speakers with out‐group accents overall, confirming social categorization of speakers based on their accent. Superior parietal and middle temporal regions were uniquely activated when listening to out‐group compared to in‐group speakers suggesting that they may be involved in extracting the attributes of speaker believability from the lower‐level acoustic variations. Basal ganglia, left cuneus and right fusiform gyrus were activated by confident expressions produced by out‐group speakers. These regions appear to participate in abstracting more ambiguous believability attributes from accented speakers (where a conflict arises between the tendency to disbelieve an out‐group speaker and the tendency to believe a confident voice). For out‐group speakers, stronger impressions of believability selectively modulated activity in the bilateral superior and middle temporal regions. Moreover, the right superior temporal gyrus, a region that was associated with perceived speaker confidence, was found to be functionally connected to the left lingual gyrus and right middle temporal gyrus when out‐group speakers were judged as more believable. These findings suggest that identity‐related voice characteristics and associated biases may influence underlying neural activities for making social attributions about out‐group speakers, affecting decisions about believability and trust. Specifically, inferences about out‐group speakers seem to be mediated to a greater extent by stimulus‐related features (i.e., vocal confidence cues) than for in‐group speakers. Our approach highlights how the voice can be studied to advance models of person perception. HIGHLIGHTSNeural activations of social inference from “out‐group” voices were examined with fMRI.Basal ganglia, left cuneus and right fusiform gyrus were enhanced by confident voices of a speaker with an accent.Connectivity between the right STG and the left lingual gyrus and right MTG increased when judging believability of an out‐group speaker.Listener attitude and intelligibility perception modulated the brain network associated with speaker believability.

Article

Caballero, J. A., Vergis, N., Jiang, X., & Pell, M. D. (2018). The sound of im/politeness. Speech Communication, 102, 39–53.

Until recently, research on im/politeness has primarily focused on the role of linguistic strategies while neglecting the contributions of prosody and acoustic cues for communicating politeness. Here, we analyzed a large set of recordings — verbal requests spoken in a direct manner (Lend me a nickel), preceded by the word “Please”, or in a conventionally-indirect manner (Can you) — which were known to convey polite or rude impressions on the listener. The pragmatic imposition of the request was also manipulated (Lend me a nickel vs. hundred). Fundamental frequency (f0: mean, range, contour shape), duration, and voice quality (harmonics-to-noise ratio) were measured over the whole utterance and for key constituents within the utterance. Differences in perceived politeness corresponded with systematic differences in continuous utterance measures as well as local acoustic adjustments, defined by both categorical and graded vocal contrasts. Compared to polite utterances, rude requests displayed a slower speech rate, lower pitch, and tended to fall in pitch (or rise less markedly in the context of yes-no questions). The high versus low imposition of a request separately influenced the acoustic structure of requests, with evidence of these effects right at utterance-onset. Results are consistent with theoretical proposals about how prosody functions to convey speaker politeness as one facet of emotive communication. It is suggested that while a specific “prosody of politeness” may not exist, prosodic cues routinely and potently interact with other sources of information to allow listeners to generate inferences about im/politeness.

Article

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., & Kotz, S. A. (2018). Dynamic Facial Expressions Prime the Processing of Emotional Prosody. Frontiers in Human Neuroscience, 12, 244–244.

Evidence suggests that emotion is represented supramodally in the human brain. Emotional facial expressions, which often precede vocally expressed emotion in real life, can modulate event-related potentials (N100 and P200) during emotional prosody processing. To investigate these cross-modal emotional interactions, two lines of research have been put forward: cross-modal integration and cross-modal priming. In cross-modal integration studies, visual and auditory channels are temporally aligned, while in priming studies they are presented consecutively. Here we used cross-modal emotional priming to study the interaction of dynamic visual and auditory emotional information. Specifically, we presented dynamic facial expressions (angry, happy, neutral) as primes and emotionally-intoned pseudo-speech sentences (angry, happy) as targets. We were interested in how prime-target congruency would affect early auditory event-related potentials, i.e., N100 and P200, in order to shed more light on how dynamic facial information is used in cross-modal emotional prediction. Results showed enhanced N100 amplitudes for incongruently primed compared to congruently and neutrally primed emotional prosody, while the latter two conditions did not significantly differ. However, N100 peak latency was significantly delayed in the neutral condition compared to the other two conditions. Source reconstruction revealed that the right parahippocampal gyrus was activated in incongruent compared to congruent trials in the N100 time window. No significant ERP effects were observed in the P200 range. Our results indicate that dynamic facial expressions influence vocal emotion processing at an early point in time, and that an emotional mismatch between a facial expression and its ensuing vocal emotional signal induces additional processing costs in the brain, potentially because the cross-modal emotional prediction mechanism is violated in case of emotional prime-target incongruency.

Article

Caballero, J. A., Díaz, M. M., Arias‐Trejo, N., Rodríguez, F. L., & Pell, M. D. (2018). How to do things with(out) words? Analyzing the effects of vocal emotional expressions on cooperation behavior.

The importance of prosodic variations in social interaction contexts has been highlighted but their effects on the regulation of specific behaviors are rarely addressed. One of the most widely researched prosodic distinctions in psychology is emotional prosody. In perceptual studies, the capacity for identifying emotions through prosodic variations has been widely addressed, but the relevance of this skill for social interaction has not been tested. However, based on theoretical accounts of emotion and empirical findings of the influence of facial emotional expressions in experiments that address their role in cooperation, it is possible to formulate predictions about the effects of emotional prosody in social interaction behavior. For this objective, in the present work, the effects of emotional prosody on cooperation were addressed, and its interaction with other behavioral intention cues (propositional content) was analyzed. Findings show that emotional prosody influences cooperation behavior but its joint effects with propositional content suggest that a complex inferential process may underlie the integration of contextual and behavioral intention cues to guide behavior. The significance of results and potential for extending future research are discussed.

Article

Jiang, X., & Pell, M. D. (2018). Predicting confidence and doubt in accented speakers: Human perception and machine learning experiments.

Speech prosody provides salient and reliable cues to facilitate social communication. What computational mechanism underlies social judgment towards “out-group” speakers is unclear. This paper focused on Speaker Confidence , a factor affecting one’s trustworthiness, persuasiveness and feeling of (un)knowing, and Speaker Accent , a factor marking one’s identity. We demonstrate that native Canadian-English listeners can recognize confident and doubtful expressions in foreign-and regional-accented speakers. A stronger impression of confidence was shown towards the native speakers. The acoustic analysis demonstrated that speakers systematically varied the mean fundamental frequency to indicate confident and doubt regardless of accent. The out-group speakers varied more on intensity height and variation to achieve certain level of confidence. Machine learning experiments showed above-chance accuracies in all accents to classify vocal expression based on global acoustic cues, highlighting the role of acoustic regularities at utterance level in confidence encoding. Moreover, the classification rate was higher when the model trained in native accent was tested on the native than the regional accent, highlighting an in-group bias of predicting novel vocal expression of confidence from acoustic cues. These findings lend support to the dialect theory of vocal expression recognition while demonstrating a computational mechanism underlying inter-cultural/inter-group confidence perception via speech prosody.

Article

Chronaki, G., Wigelsworth, M., Pell, M. D., & Kotz, S. A. (2018). The development of cross-cultural recognition of vocal emotion during childhood and adolescence. Scientific Reports, 8(1), 8659–8659.

Humans have an innate set of emotions recognised universally. However, emotion recognition also depends on socio-cultural rules. Although adults recognise vocal emotions universally, they identify emotions more accurately in their native language. We examined developmental trajectories of universal vocal emotion recognition in children. Eighty native English speakers completed a vocal emotion recognition task in their native language (English) and foreign languages (Spanish, Chinese, and Arabic) expressing anger, happiness, sadness, fear, and neutrality. Emotion recognition was compared across 8-to-10, 11-to-13-year-olds, and adults. Measures of behavioural and emotional problems were also taken. Results showed that although emotion recognition was above chance for all languages, native English speaking children were more accurate in recognising vocal emotions in their native language. There was a larger improvement in recognising vocal emotion from the native language during adolescence. Vocal anger recognition did not improve with age for the non-native languages. This is the first study to demonstrate universality of vocal emotion recognition in children whilst supporting an "in-group advantage" for more accurate recognition in the native language. Findings highlight the role of experience in emotion recognition, have implications for child development in modern multicultural societies and address important theoretical questions about the nature of emotions.

Article

Truesdale, D. M., & Pell, M. D. (2018). The sound of Passion and Indifference. Speech Communication, 99, 124–134.

Extending affective speech communication research in the context of authentic, spontaneous utterances, the present study investigates two signals of affect defined by extreme levels of physiological arousal—Passion and Indifference. Exemplars were mined from podcasts conducted in informal, unstructured contexts to examine communication at extreme levels of perceived hyper- and hypo-arousal. Utterances from twenty native speakers of Canadian/American English were submitted for perceptual validation for judgments of affective meaning (Passion, Indifference, or Neutrality) and level of arousal (“Not At All” to “Very Much”). Arousal ratings, acoustic patterns, and linguistic cues (affect/emotion words and expletives) were analyzed. In comparison to neutral utterances, Passion was communicated with the highest maximum pitch and pitch range, and highest maximum and mean amplitude, while Indifference was communicated via decreases in these measures in comparison to neutral affect. Interestingly, Passion and Neutrality were expressed with comparable absolute ranges of amplitude, while the minimum amplitudes of both Passion and Indifference were greater than those of Neutral expressions. Linguistically, Indifference was marked by significantly greater use of explicit expressions of affect (e.g. I don't care…), suggesting a linguistic encoding preference in this context. Passion was expressed with greater use of expletives; yet, their presence was not necessary to facilitate perception of a speaker's level of arousal. These findings shed new light upon the paralinguistic and linguistic features of spontaneous expressions at the extremes of the arousal continuum, highlighting key distinctions between Indifference and Neutrality with implications for vocal communication research in healthy and clinical populations.

Article

— 2017 —

Fish, K., Rothermich, K., & Pell, M. D. (2017). The sound of (in)sincerity. Journal of Pragmatics, 121, 147–161.

In social life, humans do not always communicate their sincere feelings, and speakers often tell ‘prosocial lies’ to prevent others from being hurt by negative truths. Data illuminating how a speaker's voice carries sincere or insincere attitudes in speech, and how social context shapes the expression and perception of (in)sincere utterances, are scarce. Here, we studied the communication of social, other-oriented lies occurring in short dialogues. We recorded paired questions (So, what do you think of my new hairdo?) and responses (I think it looks really amazing!) using a paradigm that elicited compliments which reflected the true positive opinion of the speaker (sincere) or were meant to hide their negative opinion (insincere/prosocial lie). These Question–Response pairs were then presented to 30 listeners, who rated the sincerity of the person uttering the compliment on a 5-point scale. Results showed that participants could successfully differentiate sincere compliments from prosocial lies based largely on vocal speech cues. Moreover, sincerity impressions were biased by how the preceding question was phrased (confident or uncertain). Acoustic analyses on a subset of utterances that promoted strong impressions of sincerity versus insincerity revealed that compliments perceived as being sincere were spoken faster and began with a higher pitch than those that sounded insincere, while compliments rated as insincere tended to get louder as the utterance unfolded. These data supply new evidence of the importance of vocal cues in evaluating sincerity, while emphasizing that motivations of both the speaker and hearer contribute to impressions of speaker sincerity.

Article

Schwartz, R., Rothermich, K., Kotz, S. A., & Pell, M. D. (2017). Unaltered emotional experience in Parkinson’s disease: Pupillometry and behavioral evidence. Journal of Clinical and Experimental Neuropsychology, 40(3), 303–316.

INTRODUCTION: Recognizing emotions in others is a pivotal part of socioemotional functioning and plays a central role in social interactions. It has been shown that individuals suffering from Parkinson's disease (PD) are less accurate at identifying basic emotions such as fear, sadness, and happiness; however, previous studies have predominantly assessed emotion processing using unimodal stimuli (e.g., pictures) that do not reflect the complexity of real-world processing demands. Dynamic, naturalistic stimuli (e.g., movies) have been shown to elicit stronger subjective emotional experiences than unimodal stimuli and can facilitate emotion recognition. METHOD: In this experiment, pupil measurements of PD patients and matched healthy controls (HC) were recorded while they watched short film clips. Participants' task was to identify the emotion elicited by each clip and rate the intensity of their emotional response. We explored (a) how PD affects subjective emotional experience in response to dynamic, ecologically valid film stimuli, and (b) whether there are PD-related changes in pupillary response, which may contribute to the differences in emotion processing reported in the literature. RESULTS: Behavioral results showed that identification of the felt emotion as well as perceived intensity varies by emotion, but no significant group effect was found. Pupil measurements revealed differences in dilation depending on the emotion evoked by the film clips (happy, tender, sadness, fear, and neutral) for both groups. CONCLUSIONS: Our results suggest that differences in emotional response may be negligible when PD patients and healthy controls are presented with dynamic, ecologically valid emotional stimuli. Given the limited data available on pupil response in PD, this study provides new evidence to suggest that the PD-related deficits in emotion processing reported in the literature may not translate to real-world differences in physiological or subjective emotion processing in early-stage PD patients.

Article

Jiang, X., Sanford, R., & Pell, M. D. (2017). Neural systems for evaluating speaker (Un)believability. Human Brain Mapping, 38(7), 3732–3749.

Our voice provides salient cues about how confident we sound, which promotes inferences about how believable we are. However, the neural mechanisms involved in these social inferences are largely unknown. Employing functional magnetic resonance imaging, we examined the brain networks and individual differences underlying the evaluation of speaker believability from vocal expressions. Participants (n = 26) listened to statements produced in a confident, unconfident, or "prosodically unmarked" (neutral) voice, and judged how believable the speaker was on a 4-point scale. We found frontal-temporal networks were activated for different levels of confidence, with the left superior and inferior frontal gyrus more activated for confident statements, the right superior temporal gyrus for unconfident expressions, and bilateral cerebellum for statements in a neutral voice. Based on listener's believability judgment, we observed increased activation in the right superior parietal lobule (SPL) associated with higher believability, while increased left posterior central gyrus (PoCG) was associated with less believability. A psychophysiological interaction analysis found that the anterior cingulate cortex and bilateral caudate were connected to the right SPL when higher believability judgments were made, while supplementary motor area was connected with the left PoCG when lower believability judgments were made. Personal characteristics, such as interpersonal reactivity and the individual tendency to trust others, modulated the brain activations and the functional connectivity when making believability judgments. In sum, our data pinpoint neural mechanisms that are involved when inferring one's believability from a speaker's voice and establish ways that these mechanisms are modulated by individual characteristics of a listener. Hum Brain Mapp 38:3732-3749, 2017. © 2017 Wiley Periodicals, Inc.

Article

Jiang, X., & Pell, M. D. (2017). The sound of confidence and doubt. Speech Communication, 88, 106–126.

Feeling of knowing (or expressed confidence) reflects a speaker's certainty or commitment to a statement and can be associated with one's trustworthiness or persuasiveness in social interaction. We investigated the perceptual-acoustic correlates of expressed confidence and doubt in spoken language, with a focus on both linguistic and vocal speech cues. In Experiment 1, utterances subserving different communicative functions (e.g., stating facts, making judgments) produced in a confident, close-to-confident, unconfident, and neutral-intending voice by six speakers, were then rated for perceived confidence by 72 native listeners. As expected, speaker confidence ratings increased with the intended level of expressed confidence; neutral-intending statements were frequently judged as relatively high in confidence. The communicative function of the statement, and the presence vs. absence of an utterance-initial probability phrase (e.g., Maybe, I'm sure), further modulated speaker confidence ratings. In Experiment 2, acoustic analysis of perceptually valid tokens rated in Experiment 1 revealed distinct patterns of pitch, intensity and temporal features according to perceived confidence levels; confident expressions were highest in fundamental frequency (f0) range, mean amplitude, and amplitude range, whereas unconfident expressions were highest in mean f0, slowest in speaking rate, with more frequent pauses. Dynamic analyses of f0 and intensity changes across the utterance uncovered distinctive patterns in expression as a function of confidence level at different positions of the utterance. Our findings provide new information on how metacognitive states such as confidence and doubt are communicated by vocal and linguistic cues which permit listeners to arrive at graded impressions of a speaker's feeling of (un)knowing.

Article

— 2016 —

Liu, P., Rigoulot, S., & Pell, M. D. (2016). Cultural immersion alters emotion perception: Neurophysiological evidence from Chinese immigrants to Canada. Social Neuroscience, 12(6), 1–16.

To explore how cultural immersion modulates emotion processing, this study examined how Chinese immigrants to Canada process multisensory emotional expressions, which were compared to existing data from two groups, Chinese and North Americans. Stroop and Oddball paradigms were employed to examine different stages of emotion processing. The Stroop task presented face-voice pairs expressing congruent/incongruent emotions and participants actively judged the emotion of one modality while ignoring the other. A significant effect of cultural immersion was observed in the immigrants' behavioral performance, which showed greater interference from to-be-ignored faces, comparable with what was observed in North Americans. However, this effect was absent in their N400 data, which retained the same pattern as the Chinese. In the Oddball task, where immigrants passively viewed facial expressions with/without simultaneous vocal emotions, they exhibited a larger visual MMN for faces accompanied by voices, again mirroring patterns observed in Chinese. Correlation analyses indicated that the immigrants' living duration in Canada was associated with neural patterns (N400 and visual mismatch negativity) more closely resembling North Americans. Our data suggest that in multisensory emotion processing, adopting to a new culture first leads to behavioral accommodation followed by alterations in brain activities, providing new evidence on human's neurocognitive plasticity in communication.

Article

Schwartz, R., & Pell, M. D. (2016). When emotion and expression diverge: The social costs of Parkinson’s disease. Journal of Clinical and Experimental Neuropsychology, 39(3), 211–230.

INTRODUCTION: Patients with Parkinson's disease (PD) are perceived more negatively than their healthy peers, yet it remains unclear what factors contribute to this negative social perception. METHOD: Based on a cohort of 17 PD patients and 20 healthy controls, we assessed how naïve raters judge the emotion and emotional intensity displayed in dynamic facial expressions as adults with and without PD watched emotionally evocative films (Experiment 1), and how age-matched peers naïve to patients' disease status judge their social desirability along various dimensions from audiovisual stimuli (interview excerpts) recorded after certain films (Experiment 2). RESULTS: In Experiment 1, participants with PD were rated as significantly more facially expressive than healthy controls; moreover, ratings demonstrated that PD patients were routinely mistaken for experiencing a negative emotion, whereas controls were rated as displaying a more positive emotion than they reported feeling. In Experiment 2, results showed that age-peers rated PD patients as significantly less socially desirable than control participants. Specifically, PD patients were rated as less involved, interested, friendly, intelligent, optimistic, attentive, and physically attractive than healthy controls. CONCLUSIONS: Taken together, our results point to a disconnect between how PD patients report feeling and attributions that others make about their emotions and social characteristics, underlining significant social challenges of the disease. In particular, changes in the ability to modulate the expression of negative emotions may contribute to the negative social impressions that many PD patients face.

Article

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., Sehm, B., & Kotz, S. A. (2016). Impaired neural processing of dynamic faces in left-onset Parkinson's disease. Neuropsychologia, 82, 123–133.

Parkinson's disease (PD) affects patients beyond the motor domain. According to previous evidence, one mechanism that may be impaired in the disease is face processing. However, few studies have investigated this process at the neural level in PD. Moreover, research using dynamic facial displays rather than static pictures is scarce, but highly warranted due to the higher ecological validity of dynamic stimuli. In the present study we aimed to investigate how PD patients process emotional and non-emotional dynamic face stimuli at the neural level using event-related potentials. Since the literature has revealed a predominantly right-lateralized network for dynamic face processing, we divided the group into patients with left (LPD) and right (RPD) motor symptom onset (right versus left cerebral hemisphere predominantly affected, respectively). Participants watched short video clips of happy, angry, and neutral expressions and engaged in a shallow gender decision task in order to avoid confounds of task difficulty in the data. In line with our expectations, the LPD group showed significant face processing deficits compared to controls. While there were no group differences in early, sensory-driven processing (fronto-central N1 and posterior P1), the vertex positive potential, which is considered the fronto-central counterpart of the face-specific posterior N170 component, had a reduced amplitude and delayed latency in the LPD group. This may indicate disturbances of structural face processing in LPD. Furthermore, the effect was independent of the emotional content of the videos. In contrast, static facial identity recognition performance in LPD was not significantly different from controls, and comprehensive testing of cognitive functions did not reveal any deficits in this group. We therefore conclude that PD, and more specifically the predominant right-hemispheric affection in left-onset PD, is associated with impaired processing of dynamic facial expressions, which could be one of the mechanisms behind the often reported problems of PD patients in their social lives.

Article

Jiang, X., & Pell, M. D. (2016). The feeling of another’s knowing: How “mixed messages” in speech are reconciled. Journal of Experimental Psychology Human Perception & Performance, 42(9), 1412–1428.

Listeners often encounter conflicting verbal and vocal cues about the speaker's feeling of knowing; these "mixed messages" can reflect online shifts in one's mental state as they utter a statement, or serve different social-pragmatic goals of the speaker. Using a cross-splicing paradigm, we investigated how conflicting cues about a speaker's feeling of (un)knowing change one's perception. Listeners rated the confidence of speakers of utterances containing an initial verbal phrase congruent or incongruent with vocal cues in a subsequent statement, while their brain potentials were tracked. Different forms of conflicts modulated the perceived confidence of the speaker, the extent to which was stronger for female listeners. A confident phrase followed by an unconfident voice enlarged an anteriorly maximized negativity for female listeners and late positivity for male listeners, suggesting that mental representations of another's feeling of knowing in face of this conflict were hampered by increased demands of integration for females and increased demands on updating for males. An unconfident phrase followed by a confident voice elicited a delayed sustained positivity (from 900 ms) in female participants only, suggesting females generated inferences to moderate the conflicting message about speaker knowledge. We highlight ways that verbal and vocal cues are real-time integrated to access a speaker's feeling of (un)knowing, while arguing that females are more sensitive to the social relevance of conflicting speaker cues. (PsycINFO Database Record

Article

— 2015 —

Jiang, X., & Pell, M. D. (2015). Neural responses towards a speaker's feeling of (un)knowing. Neuropsychologia, 81, 79–93.

During interpersonal communication, listeners must rapidly evaluate verbal and vocal cues to arrive at an integrated meaning about the utterance and about the speaker, including a representation of the speaker's 'feeling of knowing' (i.e., how confident they are in relation to the utterance). In this study, we investigated the time course and neural responses underlying a listener's ability to evaluate speaker confidence from combined verbal and vocal cues. We recorded real-time brain responses as listeners judged statements conveying three levels of confidence with the speaker's voice (confident, close-to-confident, unconfident), which were preceded by meaning-congruent lexical phrases (e.g. I am positive, Most likely, Perhaps). Event-related potentials to utterances with combined lexical and vocal cues about speaker confidence were compared to responses elicited by utterances without the verbal phrase in a previous study (Jiang and Pell, 2015). Utterances with combined cues about speaker confidence elicited reduced, N1, P2 and N400 responses when compared to corresponding utterances without the phrase. When compared to confident statements, close-to-confident and unconfident expressions elicited reduced N1 and P2 responses and a late positivity from 900 to 1250 ms; unconfident and close-to-confident expressions were differentiated later in the 1250-1600 ms time window. The effect of lexical phrases on confidence processing differed for male and female participants, with evidence that female listeners incorporated information from the verbal and vocal channels in a distinct manner. Individual differences in trait empathy and trait anxiety also moderated neural responses during confidence processing. Our findings showcase the cognitive processing mechanisms and individual factors governing how we infer a speaker's mental (knowledge) state from the speech signal.

Article

Pell, M. D., Rothermich, K., Liu, P., Paulmann, S., Sethi, S., & Rigoulot, S. (2015). Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biological Psychology, 111, 14–25.

This study used event-related brain potentials (ERPs) to compare the time course of emotion processing from non-linguistic vocalizations versus speech prosody, to test whether vocalizations are treated preferentially by the neurocognitive system. Participants passively listened to vocalizations or pseudo-utterances conveying anger, sadness, or happiness as the EEG was recorded. Simultaneous effects of vocal expression type and emotion were analyzed for three ERP components (N100, P200, late positive component). Emotional vocalizations and speech were differentiated very early (N100) and vocalizations elicited stronger, earlier, and more differentiated P200 responses than speech. At later stages (450-700ms), anger vocalizations evoked a stronger late positivity (LPC) than other vocal expressions, which was similar but delayed for angry speech. Individuals with high trait anxiety exhibited early, heightened sensitivity to vocal emotions (particularly vocalizations). These data provide new neurophysiological evidence that vocalizations, as evolutionarily primitive signals, are accorded precedence over speech-embedded emotions in the human voice.

Article

Rothermich, K., & Pell, M. D. (2015). Introducing RISC: A New Video Inventory for Testing Social Perception. PLoS ONE, 10(7), e0133902–e0133902.

Indirect forms of speech, such as sarcasm, jocularity (joking), and 'white lies' told to spare another's feelings, occur frequently in daily life and are a problem for many clinical populations. During social interactions, information about the literal or nonliteral meaning of a speaker unfolds simultaneously in several communication channels (e.g., linguistic, facial, vocal, and body cues); however, to date many studies have employed uni-modal stimuli, for example focusing only on the visual modality, limiting the generalizability of these results to everyday communication. Much of this research also neglects key factors for interpreting speaker intentions, such as verbal context and the relationship of social partners. Relational Inference in Social Communication (RISC) is a newly developed (English-language) database composed of short video vignettes depicting sincere, jocular, sarcastic, and white lie social exchanges between two people. Stimuli carefully manipulated the social relationship between communication partners (e.g., boss/employee, couple) and the availability of contextual cues (e.g. preceding conversations, physical objects) while controlling for major differences in the linguistic content of matched items. Here, we present initial perceptual validation data (N = 31) on a corpus of 920 items. Overall accuracy for identifying speaker intentions was above 80% correct and our results show that both relationship type and verbal context influence the categorization of literal and nonliteral interactions, underscoring the importance of these factors in research on speaker intentions. We believe that RISC will prove highly constructive as a tool in future research on social cognition, inter-personal communication, and the interpretation of speaker intentions in both healthy adults and clinical populations.

Article

Liu, P., Rigoulot, S., & Pell, M. D. (2015). Cultural differences in on-line sensitivity to emotional voices: comparing East and West. Frontiers in Human Neuroscience, 9.

Evidence that culture modulates on-line neural responses to the emotional meanings encoded by vocal and facial expressions was demonstrated recently in a study comparing English North Americans and Chinese (Liu et al., 2015). Here, we compared how individuals from these two cultures passively respond to emotional cues from faces and voices using an Oddball task. Participants viewed in-group emotional faces, with or without simultaneous vocal expressions, while performing a face-irrelevant visual task as the EEG was recorded. A significantly larger visual MMN was observed for Chinese versus English participants when faces were accompanied by voices, suggesting that Chinese were influenced to a larger extent by task-irrelevant vocal cues. These data highlight further differences in how adults from East Asian versus Western cultures process socio-emotional cues, arguing that distinct cultural practices in communication (e.g., display rules) shape neurocognitive activity associated with the early perception and integration of multi-sensory emotional cues.

Article

Jiang, X., & Pell, M. D. (2015). On how the brain decodes vocal cues about speaker confidence. Cortex, 66, 9–34.

In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions.

Article

Rigoulot, S., Pell, M. D., & Armony, J. L. (2015). Time course of the influence of musical expertise on the processing of vocal and musical sounds. Neuroscience, 290, 175–184.

Previous functional magnetic resonance imaging (fMRI) studies have suggested that different cerebral regions preferentially process human voice and music. Yet, little is known on the temporal course of the brain processes that decode the category of sounds and how the expertise in one sound category can impact these processes. To address this question, we recorded the electroencephalogram (EEG) of 15 musicians and 18 non-musicians while they were listening to short musical excerpts (piano and violin) and vocal stimuli (speech and non-linguistic vocalizations). The task of the participants was to detect noise targets embedded within the stream of sounds. Event-related potentials revealed an early differentiation of sound category, within the first 100 ms after the onset of the sound, with mostly increased responses to musical sounds. Importantly, this effect was modulated by the musical background of participants, as musicians were more responsive to music sounds than non-musicians, consistent with the notion that musical training increases sensitivity to music. In late temporal windows, brain responses were enhanced in response to vocal stimuli, but musicians were still more responsive to music. These results shed new light on the temporal course of neural dynamics of auditory processing and reveal how it is impacted by the stimulus category and the expertise of participants.

Article

Jiang, X., Paulmann, S., Robin, J., & Pell, M. D. (2015). More than accuracy: Nonverbal dialects modulate the time course of vocal emotion recognition across cultures. Journal of Experimental Psychology Human Perception & Performance, 41(3), 597–612.

Using a gating paradigm, this study investigated the nature of the in-group advantage in vocal emotion recognition by comparing 2 distinct cultures. Pseudoutterances conveying 4 basic emotions, expressed in English and Hindi, were presented to English and Hindi listeners. In addition to hearing full utterances, each stimulus was gated from its onset to construct 5 processing intervals to pinpoint when the in-group advantage emerges, and whether this differs when listening to a foreign language (English participants judging Hindi) or a second language (Hindi participants judging English). An index of the mean emotion identification point for each group and unbiased measures of accuracy at each time point was calculated. Results showed that in each language condition, native listeners were faster and more accurate than non-native listeners to recognize emotions. The in-group advantage emerged in both conditions after processing 400 ms to 500 ms of acoustic information. In the bilingual Hindi group, greater oral proficiency in English predicted faster and more accurate recognition of English emotional expressions. Consistent with dialect theory, our findings provide new evidence that nonverbal dialects impede both the accuracy and the efficiency of vocal emotion processing in cross-cultural settings, even when individuals are highly proficient in the out-group target language.

Article

— 2014 —

Liu, P., Rigoulot, S., & Pell, M. D. (2014). Culture modulates the brain response to human expressions of emotion: Electrophysiological evidence. Neuropsychologia, 67, 1–13.

To understand how culture modulates on-line neural responses to social information, this study compared how individuals from two distinct cultural groups, English-speaking North Americans and Chinese, process emotional meanings of multi-sensory stimuli as indexed by both behaviour (accuracy) and event-related potential (N400) measures. In an emotional Stroop-like task, participants were presented face-voice pairs expressing congruent or incongruent emotions in conditions where they judged the emotion of one modality while ignoring the other (face or voice focus task). Results indicated that while both groups were sensitive to emotional differences between channels (with lower accuracy and higher N400 amplitudes for incongruent face-voice pairs), there were marked group differences in how intruding facial or vocal cues affected accuracy and N400 amplitudes, with English participants showing greater interference from irrelevant faces than Chinese. Our data illuminate distinct biases in how adults from East Asian versus Western cultures process socio-emotional cues, supplying new evidence that cultural learning modulates not only behaviour, but the neurocognitive response to different features of multi-channel emotion expressions.

Article

Rigoulot, S., & Pell, M. D. (2014). Emotion in the voice influences the way we scan emotional faces. Speech Communication, 65, 36–49.

Previous eye-tracking studies have found that listening to emotionally-inflected utterances guides visual behavior towards an emotionally congruent face (e.g., Rigoulot and Pell, 2012). Here, we investigated in more detail whether emotional speech prosody influences how participants scan and fixate specific features of an emotional face that is congruent or incongruent with the prosody. Twenty-one participants viewed individual faces expressing fear, sadness, disgust, or happiness while listening to an emotionally-inflected pseudo-utterance spoken in a congruent or incongruent prosody. Participants judged whether the emotional meaning of the face and voice were the same or different (match/mismatch). Results confirm that there were significant effects of prosody congruency on eye movements when participants scanned a face, although these varied by emotion type; a matching prosody promoted more frequent looks to the upper part of fear and sad facial expressions, whereas visual attention to upper and lower regions of happy (and to some extent disgust) faces was more evenly distributed. These data suggest ways that vocal emotion cues guide how humans process facial expressions in a way that could facilitate recognition of salient visual cues, to arrive at a holistic impression of intended meanings during interpersonal events.

Article

Pell, M. D., Monetta, L., Rothermich, K., Kotz, S. A., Cheang, H. S., & McDonald, S. (2014). Social perception in adults with Parkinson’s disease. Neuropsychology, 28(6), 905–916.

Objective: Our study assessed how nondemented patients with Parkinson's disease (PD) interpret the affective and mental states of others from spoken language (adopt a "theory of mind") in ecologically valid social contexts. A secondary goal was to examine the relationship between emotion processing, mentalizing, and executive functions in PD during interpersonal communication. Method: Fifteen adults with PD and 16 healthy adults completed The Awareness of Social Inference Test, a standardized tool comprised of videotaped vignettes of everyday social interactions (McDonald, Flanagan, Rollins, & Kinch, 2003). Individual subtests assessed participants' ability to recognize basic emotions and to infer speaker intentions (sincerity, lies, sarcasm) from verbal and nonverbal cues, and to judge speaker knowledge, beliefs, and feelings. A comprehensive neuropsychological evaluation was also conducted. Results: Patients with mild-moderate PD were impaired in the ability to infer "enriched" social intentions, such as sarcasm or lies, from nonliteral remarks; in contrast, adults with and without PD showed a similar capacity to recognize emotions and social intentions meant to be literal. In the PD group, difficulties using theory of mind to draw complex social inferences were significantly correlated with limitations in working memory and executive functioning. Conclusions: In early PD, functional compromise of the frontal-striatal-dorsal system yields impairments in social perception and understanding nonliteral speaker intentions that draw upon cognitive theory of mind. Deficits in social perception in PD are exacerbated by a decline in executive resources, which could hamper the strategic deployment of attention to multiple information sources necessary to infer social intentions. (PsycINFO Database Record (c) 2014 APA, all rights reserved).

Article

Rigoulot, S., Fish, K., & Pell, M. D. (2014). Neural correlates of inferring speaker sincerity from white lies: An event-related potential source localization study. Brain Research, 1565, 48–62.

During social interactions, listeners weigh the importance of linguistic and extra-linguistic speech cues (prosody) to infer the true intentions of the speaker in reference to what is actually said. In this study, we investigated what brain processes allow listeners to detect when a spoken compliment is meant to be sincere (true compliment) or not ("white lie"). Electroencephalograms of 29 participants were recorded while they listened to Question-Response pairs, where the response was expressed in either a sincere or insincere tone (e.g., "So, what did you think of my presentation?"/"I found it really interesting."). Participants judged whether the response was sincere or not. Behavioral results showed that prosody could be effectively used to discern the intended sincerity of compliments. Analysis of temporal and spatial characteristics of event-related potentials (P200, N400, P600) uncovered significant effects of prosody on P600 amplitudes, which were greater in response to sincere versus insincere compliments. Using low resolution brain electromagnetic tomography (LORETA), we determined that the anatomical sources of this activity were likely located in the (left) insula, consistent with previous reports of insular activity in the perception of lies and concealments. These data extend knowledge of the neurocognitive mechanisms that permit context-appropriate inferences about speaker feelings and intentions during interpersonal communication.

Article

Farrugia, N., Benoit, C., Schwartze, M., Pell, M. D., Obrig, H., Bella, S. D., & Kotz, S. A. (2014). Auditory Cueing in Parkinson's Disease: Effects on Temporal Processing and Spontaneous Theta Oscillations. Procedia - Social and Behavioral Sciences, 126, 104–105.

The beneficial effect of auditory cueing on gait performance in Parkinson's disease (PD) has been widely documented. Nevertheless, little is known about the neural underpinnings of this effect and the consequences of auditory cueing beyond improved gait kinematics. The therapy relies on processing the temporal regularity in an auditory signal to which steps are synchronized. We hypothesize that the benefits of auditory cueing involve a temporal processing network comprising the cerebellum, the thalamus, the basal ganglia as well as the supplementary motor area (Kotz and Schwartze, 2011, Schwartze et al., 2011). While deficits in temporal processing in PD have been discussed (Harrington et al., 1998; Pastor et al., 1992), recently there is increasing evidence of a widespread slowing of resting-state oscillations in PD (e.g., Stoffers et al., 2007), and that such oscillations are linked to symptom severity, cognitive decline and disease progression (Olde Dubbelink et al., 2013). In the current EEG study, we provide evidence that neural responses reflected in both task-induced and resting-state activity are sensitive to cueing therapy. Fifteen patients with PD were submitted to a one-month auditory cueing therapy (3 times/week for 30 minutes). Patients were tested before, immediately after, and one month after the end of the program (follow up session). In each testing session, patients were submitted to an EEG protocol consisting of 8 minutes of resting state (alternating 2 minutes of eyes closed and eyes open), followed by an auditory oddball experiment, and by another 8 minutes of resting state. In the oddball task, temporally regular (inter-stimulus-interval, ISI=800 ms) and irregular (random 200-1000 ms ISI) oddball sequences were presented. The sequences consisted of 360 standard (600 Hz) and 90 deviant (660 Hz) equidurational (200 ms) sinusoidal tones. The participants’ task was to count the deviant tones. Previous studies using this paradigm showed enhanced P300 responses to deviant tones in the regular condition as compared to the irregular condition (Schwartze et al., 2011). Before the cueing therapy PD patients failed to show a difference between deviants elicited in the regular and irregular condition. They showed a comparable difference (to matched controls) between the two conditions after the therapy. Further analysis of the relative power of the resting state oscillations reveals that the relative change in theta power was negatively related to improvement in patients’ walking patterns, suggesting a link between the effect of auditory cueing and functional connectivity in resting state networks. Neural responses associated with temporal regularity as well as spontaneous resting oscillations may provide further insight into compensatory mechanisms induced by auditory cueing in PD.

Article

— 2013 —

Sethi, S., & Pell, M. D. (2013). A study of the effect of auditory prime type on emotional facial expression recognition. McGill Science Undergraduate Research Journal, 8(1), 49–54.

Background: In this study, we investigated the influence of two types of emotional auditory primes - vocalizations and pseudoutterances - on the ability to judge a subsequently presented emotional facial expression in an event-related potential (ERP) study using the facial-affect decision task. We hypothesized that accuracy would be greater for congruent trials than for incongruent trials. This is due to the possibility that a congruent prime would allow the listener to implicitly identify the particular emotion of the face more effectively. We also hypothesized that the normal priming effect would be observed in the N400 for both prime types, i.e. a greater negativity for incongruent trials than for congruent trials. Methods: Emotional primes (vocalization or pseudoutterance) were presented to participants who were then asked to make a judgment regarding whether or not a facial expression conveyed an emotion. Behavioural data on participant accuracy and experimental electroencephalogram (EEG) data were collected and subsequently analyzed for six participants. Results: Behavioural results showed that participants were more accurate in judging faces when primed with vocalizations than pseudoutterances. ERP results revealed that a normal priming effect was observed for vocalizations in the 150 msec - 250 msec temporal window – where greater negativities were produced during incongruent trials than during congruent trials – whereas the reverse effect was observed for pseudoutterances. Few participants were tested (n = 7). Hence, this study is a pilot study preceding a further study conducted with a greater sample size (n = 25) and slight modifications in the methodology (such as the duration of auditory primes.) Conclusions: Vocalizations showed the expected priming effect of greater negativities for incongruent trials than for congruent trials, while pseudoutterances unexpectedly showed the opposite effect. These results suggest that vocalizations may provide more prosodic information in a shorter time and thereby generate the expected congruency effect.

Article

Rigoulot, S., Wassiliwizky, E., & Pell, M. D. (2013). Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition. Frontiers in Psychology, 4, 367–367.

Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400-1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech.

Article

— 2012 —

Schwartz, R., & Pell, M. D. (2012). Emotional Speech Processing at the Intersection of Prosody and Semantics. PLoS ONE, 7(10), e47279–e47279.

The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

Article

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., Strecker, K., Schwarz, J., & Kotz, S. A. (2012). An ERP study of vocal emotion processing in asymmetric Parkinson’s disease. Social Cognitive and Affective Neuroscience, 8(8), 918–927.

Parkinson's disease (PD) has been related to impaired processing of emotional speech intonation (emotional prosody). One distinctive feature of idiopathic PD is motor symptom asymmetry, with striatal dysfunction being strongest in the hemisphere contralateral to the most affected body side. It is still unclear whether this asymmetry may affect vocal emotion perception. Here, we tested 22 PD patients (10 with predominantly left-sided [LPD] and 12 with predominantly right-sided motor symptoms) and 22 healthy controls in an event-related potential study. Sentences conveying different emotional intonations were presented in lexical and pseudo-speech versions. Task varied between an explicit and an implicit instruction. Of specific interest was emotional salience detection from prosody, reflected in the P200 component. We predicted that patients with predominantly right-striatal dysfunction (LPD) would exhibit P200 alterations. Our results support this assumption. LPD patients showed enhanced P200 amplitudes, and specific deficits were observed for disgust prosody, explicit anger processing and implicit processing of happy prosody. Lexical speech was predominantly affected while the processing of pseudo-speech was largely intact. P200 amplitude in patients correlated significantly with left motor scores and asymmetry indices. The data suggest that emotional salience detection from prosody is affected by asymmetric neuronal degeneration in PD.

Article

Pell, M. D., Robin, J., & Paulmann, S. (2012). How quickly do listeners recognize emotional prosody in their native versus a foreign language?.

This study investigated whether the recognition of emotions from speech prosody occurs in a similar manner and has a similar time course when adults listen to their native language versus a foreign language. Native English listeners were presented emotionally-inflected pseudo-utterances produced in English or Hindi which had been gated to different time durations (200, 400, 500, 600, 700 ms). Results looked at how accurate the participants were to recognize emotions in each language condition and explored whether particular emotions could be identified from shorter time segments, and whether this was influenced by language experience. Results demonstrated that listeners recognized emotions reliably in both their native and in a foreign language; however, they demonstrated an advantage in accuracy and speed to detect some, but not all emotions, in the native language condition.

Article

Liu, P., & Pell, M. D. (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal emotional stimuli. Behavior Research Methods, 44(4), 1042–1051.

To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes. To access the database, please contact pan.liu [at] mail.mcgill.ca.

Article

Rigoulot, S., & Pell, M. D. (2012). Seeing Emotion with Your Ears: Emotional Prosody Implicitly Guides Visual Attention to Faces. PLoS ONE, 7(1), e30740–e30740.

Interpersonal communication involves the processing of multimodal emotional cues, particularly facial expressions (visual modality) and emotional speech prosody (auditory modality) which can interact during information processing. Here, we investigated whether the implicit processing of emotional prosody systematically influences gaze behavior to facial expressions of emotion. We analyzed the eye movements of 31 participants as they scanned a visual array of four emotional faces portraying fear, anger, happiness, and neutrality, while listening to an emotionally-inflected pseudo-utterance (Someone migged the pazing) uttered in a congruent or incongruent tone. Participants heard the emotional utterance during the first 1250 milliseconds of a five-second visual array and then performed an immediate recall decision about the face they had just seen. The frequency and duration of first saccades and of total looks in three temporal windows ([0-1250 ms], [1250-2500 ms], [2500-5000 ms]) were analyzed according to the emotional content of faces and voices. Results showed that participants looked longer and more frequently at faces that matched the prosody in all three time windows (emotion congruency effect), although this effect was often emotion-specific (with greatest effects for fear). Effects of prosody on visual attention to faces persisted over time and could be detected long after the auditory information was no longer present. These data imply that emotional prosody is processed automatically during communication and that these cues play a critical role in how humans respond to related visual cues in the environment, such as facial expressions.

Article

— 2011 —

Pell, M. D., & Kotz, S. A. (2011). On the Time Course of Vocal Emotion Recognition. PLoS ONE, 6(11), e27256–e27256.

How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the "identification point" for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.

Article

Jesso, S., Morlog, D., Ross, S. E., Pell, M. D., Pasternak, S., Mitchell, D., Kertesz, A., & Finger, E. (2011). The effects of oxytocin on social cognition and behaviour in frontotemporal dementia. Brain, 134(9), 2493–2501.

Patients with behavioural variant frontotemporal dementia demonstrate abnormalities in behaviour and social cognition, including deficits in emotion recognition. Recent studies suggest that the neuropeptide oxytocin is an important mediator of social behaviour, enhancing prosocial behaviours and some aspects of emotion recognition across species. The objective of this study was to assess the effects of a single dose of intranasal oxytocin on neuropsychiatric behaviours and emotion processing in patients with behavioural variant frontotemporal dementia. In a double-blind, placebo-controlled, randomized cross-over design, 20 patients with behavioural variant frontotemporal dementia received one dose of 24 IU of intranasal oxytocin or placebo and then completed emotion recognition tasks known to be affected by frontotemporal dementia and by oxytocin. Caregivers completed validated behavioural ratings at 8 h and 1 week following drug administrations. A significant improvement in scores on the Neuropsychiatric Inventory was observed on the evening of oxytocin administration compared with placebo and compared with baseline ratings. Oxytocin was also associated with reduced recognition of angry facial expressions by patients with behavioural variant frontotemporal dementia. Together these findings suggest that oxytocin is a potentially promising, novel symptomatic treatment candidate for patients with behavioural variant frontotemporal dementia and that further study of this neuropeptide in frontotemporal dementia is warranted.

Article

Cheang, H. S., & Pell, M. D. (2011). Recognizing sarcasm without language. Pragmatics & Cognition, 19(2), 203–223.

The goal of the present research was to determine whether certain speaker intentions conveyed through prosody in an unfamiliar language can be accurately recognized. English and Cantonese utterances expressing sarcasm, sincerity, humorous irony, or neutrality through prosody were presented to English and Cantonese listeners unfamiliar with the other language. Listeners identified the communicative intent of utterances in both languages in a crossed design. Participants successfully identified sarcasm spoken in their native language but identified sarcasm at near-chance levels in the unfamiliar language. Both groups were relatively more successful at recognizing the other attitudes when listening to the unfamiliar language (in addition to the native language). Our data suggest that while sarcastic utterances in Cantonese and English share certain acoustic features, these cues are insufficient to recognize sarcasm between languages; rather, this ability depends on (native) language experience.

Article

Paulmann, S., Titone, D., & Pell, M. D. (2011). How emotional prosody guides your way: Evidence from eye movements. Speech Communication, 54(1), 92–107.

This study investigated cross-modal effects of emotional voice tone (prosody) on face processing during instructed visual search. Specifically, we evaluated whether emotional prosodic cues in speech have a rapid, mandatory influence on eye movements to an emotionally-related face, and whether these effects persist as semantic information unfolds. Participants viewed an array of six emotional faces while listening to instructions spoken in an emotionally congruent or incongruent prosody (e.g., “Click on the happy face” spoken in a happy or angry voice). The duration and frequency of eye fixations were analyzed when only prosodic cues were emotionally meaningful (pre-emotional label window: “Click on the/…”), and after emotional semantic information was available (post-emotional label window: “…/happy face”). In the pre-emotional label window, results showed that participants made immediate use of emotional prosody, as reflected in significantly longer frequent fixations to emotionally congruent versus incongruent faces. However, when explicit semantic information in the instructions became available (post-emotional label window), the influence of prosody on measures of eye gaze was relatively minimal. Our data show that emotional prosody has a rapid impact on gaze behavior during social information processing, but that prosodic meanings can be overridden by semantic cues when linguistic information is task relevant.

Article

Jaywant, A., & Pell, M. D. (2011). Categorical processing of negative emotions from speech prosody. Speech Communication, 54(1), 1–10.

Everyday communication involves processing nonverbal emotional cues from auditory and visual stimuli. To characterize whether emotional meanings are processed with category-specificity from speech prosody and facial expressions, we employed a cross-modal priming task (the Facial Affect Decision Task; Pell, 2005a) using emotional stimuli with the same valence but that differed by emotion category. After listening to angry, sad, disgusted, or neutral vocal primes, subjects rendered a facial affect decision about an emotionally congruent or incongruent face target. Our results revealed that participants made fewer errors when judging face targets that conveyed the same emotion as the vocal prime, and responded significantly faster for most emotions (anger and sadness). Surprisingly, participants responded slower when the prime and target both conveyed disgust, perhaps due to attention biases for disgust-related stimuli. Our findings suggest that vocal emotional expressions with similar valence are processed with category specificity, and that discrete emotion knowledge implicitly affects the processing of emotional faces between sensory modalities.

Article

Paulmann, S., & Pell, M. D. (2011). Is there an advantage for recognizing multi-modal emotional stimuli?. Motivation and Emotion, 35(2), 192–201.

Emotions can be recognized whether conveyed by facial expressions, linguistic cues (semantics), or prosody (voice tone). However, few studies have empirically documented the extent to which multi-modal emotion perception differs from uni-modal emotion perception. Here, we tested whether emotion recognition is more accurate for multi-modal stimuli by presenting stimuli with different combinations of facial, semantic, and prosodic cues. Participants judged the emotion conveyed by short utterances in six channel conditions. Results indicated that emotion recognition is significantly better in response to multi-modal versus uni-modal stimuli. When stimuli contained only one emotional channel, recognition tended to be higher in the visual modality (i.e., facial expressions, semantic information conveyed by text) than in the auditory modality (prosody), although this pattern was not uniform across emotion categories. The advantage for multi-modal recognition may reflect the automatic integration of congruent emotional information across channels which enhances the accessibility of emotion-related knowledge in memory.

Article

— 2010 —

Pell, M. D., Jaywant, A., Monetta, L., & Kotz, S. A. (2010). Emotional speech processing: Disentangling the effects of prosody and semantic cues. Cognition & Emotion, 25(5), 834–853.

To inform how emotions in speech are implicitly processed and registered in memory, we compared how emotional prosody, emotional semantics, and both cues in tandem prime decisions about conjoined emotional faces. Fifty-two participants rendered facial affect decisions (Pell, 2005a), indicating whether a target face represented an emotion (happiness or sadness) or not (a facial grimace), after passively listening to happy, sad, or neutral prime utterances. Emotional information from primes was conveyed by: (1) prosody only; (2) semantic cues only; or (3) combined prosody and semantic cues. Results indicated that prosody, semantics, and combined prosody-semantic cues facilitate emotional decisions about target faces in an emotion-congruent manner. However, the magnitude of priming did not vary across tasks. Our findings highlight that emotional meanings of prosody and semantic cues are systematically registered during speech processing, but with similar effects on associative knowledge about emotions, which is presumably shared by prosody, semantics, and faces.

Article

Paulmann, S., & Pell, M. D. (2010). Contextual influences of emotional speech prosody on face processing: How much is enough?. Cognitive Affective & Behavioral Neuroscience, 10(2), 230–242.

The influence of emotional prosody on the evaluation of emotional facial expressions was investigated in an event-related brain potential (ERP) study using a priming paradigm, the facial affective decision task. Emotional prosodic fragments of short (200-msec) and medium (400-msec) duration were presented as primes, followed by an emotionally related or unrelated facial expression (or facial grimace, which does not resemble an emotion). Participants judged whether or not the facial expression represented an emotion. ERP results revealed an N400-like differentiation for emotionally related prime-target pairs when compared with unrelated prime-target pairs. Faces preceded by prosodic primes of medium length led to a normal priming effect (larger negativity for unrelated than for related prime-target pairs), but the reverse ERP pattern (larger negativity for related than for unrelated prime-target pairs) was observed for faces preceded by short prosodic primes. These results demonstrate that brief exposure to prosodic cues can establish a meaningful emotional context that influences related facial processing; however, this context does not always lead to a processing advantage when prosodic information is very short in duration.

Article

Dara, C., & Pell, M. D. (2010). Hemispheric contributions for processing pitch and speech rate cues to emotion: fMRI data.

To determine the neural mechanisms involved in vocal emotion processing, the current study employed functional magnetic resonance imaging (fMRI) to investigate the neural structures engaged in processing acoustic cues to infer emotional meaning. Two critical acoustic cues -pitch and speech rate -were systematically manipulated and presented in a discrimination task. Results confirmed that a bilateral network constituting frontal and temporal regions is engaged when discriminating vocal emotion expressions; however, we observed greater sensitivity to pitch cues in the right mid superior temporal gyrus/sulcus (STG/STS), whereas activation in both left and right mid STG/STS was observed for speech rate processing.

Article

Pell, M. D., Jaywant, A., Monetta, L., & Kotz, S. A. (2010). The contributions of prosody and semantic context in emotional speech processing.

The present study examined the relative contributions of prosody and semantic context in the implicit processing of emotions from spoken language. In three separate tasks, we compared the degree to which happy and sad emotional prosody alone, emotional semantic context alone, and combined emotional prosody and semantic information would prime subsequent decisions about an emotionally congruent or incongruent facial expression. In all three tasks, we observed a congruency effect, whereby prosodic or semantic features of the prime facilitated decisions about emotionally-congruent faces. However, the extent of this priming was similar in the three tasks. Our results imply that prosody and semantic cues hold similar potential to activate emotion-related knowledge in memory when they are implicitly processed in speech, due to underlying connections in associative memory shared by prosody, semantics, and facial displays of emotion.

Article

Paulmann, S., & Pell, M. D. (2010). Dynamic emotion processing in Parkinson's disease as a function of channel availability. Journal of Clinical and Experimental Neuropsychology, 32(8), 822–835.

Parkinson's disease (PD) is linked to impairments for recognizing emotional expressions, although the extent and nature of these communication deficits are uncertain. Here, we compared how adults with and without PD recognize dynamic expressions of emotion in three channels, involving lexical-semantic, prosody, and/or facial cues (each channel was investigated individually and in combination). Results indicated that while emotion recognition increased with channel availability in the PD group, patients performed significantly worse than healthy participants in all conditions. Difficulties processing dynamic emotional stimuli in PD could be linked to striatal dysfunction, which reduces efficient binding of sequential information in the disease.

Article

Dimoska, A., McDonald, S., Pell, M. D., Tate, R., & James, C. (2010). Recognizing vocal expressions of emotion in patients with social skills deficits following traumatic brain injury. Journal of the International Neuropsychological Society, 16(2), 369–382.

Perception of emotion in voice is impaired following traumatic brain injury (TBI). This study examined whether an inability to concurrently process semantic information (the "what") and emotional prosody (the "how") of spoken speech contributes to impaired recognition of emotional prosody and whether impairment is ameliorated when little or no semantic information is provided. Eighteen individuals with moderate-to-severe TBI showing social skills deficits during inpatient rehabilitation were compared with 18 demographically matched controls. Participants completed two discrimination tasks using spoken sentences that varied in the amount of semantic information: that is, (1) well-formed English, (2) a nonsense language, and (3) low-pass filtered speech producing "muffled" voices. Reducing semantic processing demands did not improve perception of emotional prosody. The TBI group were significantly less accurate than controls. Impairment was greater within the TBI group when accessing semantic memory to label the emotion of sentences, compared with simply making "same/different" judgments. Findings suggest an impairment of processing emotional prosody itself rather than semantic processing demands which leads to an over-reliance on the "what" rather than the "how" in conversational remarks. Emotional recognition accuracy was significantly related to the ability to inhibit prepotent responses, consistent with neuroanatomical research suggesting similar ventrofrontal systems subserve both functions.

Article

— 2009 —

Paulmann, S., & Pell, M. D. (2009). Facial expression decoding as a function of emotional meaning status: ERP evidence. Neuroreport, 20(18), 1603–1608.

To further specify the time course of (emotional) face processing, this study compared event-related potentials elicited by faces conveying prototypical basic emotions, nonprototypical affective expressions (grimaces), and neutral faces. Results showed that prototypical and nonprototypical facial expressions could each be differentiated from neutral expressions in three different event-related potential component amplitudes (P200, early negativity, and N400), which are believed to index distinct processing stages in facial expression decoding. On the basis of the distribution of effects, our results suggest that early processing is mediated by shared neural generators for prototypical and nonprototypical facial expressions; however, later processing stages seem to engage distinct subsystems for the three facial expression types investigated according to their emotionality and meaning status.

Article

Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2009). Factors in the recognition of vocally expressed emotions: A comparison of four languages. Journal of Phonetics, 37(4), 417–435.

To understand how language influences the vocal communication of emotion, we investigated how discrete emotions are recognized and acoustically differentiated in four language contexts—English, German, Hindi, and Arabic. Vocal expressions of six emotions (anger, disgust, fear, sadness, happiness, pleasant surprise) and neutral expressions were elicited from four native speakers of each language. Each speaker produced pseudo-utterances (“nonsense speech”) which resembled their native language to express each emotion type, and the recordings were judged for their perceived emotional meaning by a group of native listeners in each language condition. Emotion recognition and acoustic patterns were analyzed within and across languages. Although overall recognition rates varied by language, all emotions could be recognized strictly from vocal cues in each language at levels exceeding chance. Anger, sadness, and fear tended to be recognized most accurately irrespective of language. Acoustic and discriminant function analyses highlighted the importance of speaker fundamental frequency (i.e., relative pitch level and variability) for signalling vocal emotions in all languages. Our data emphasize that while emotional communication is governed by display rules and other social variables, vocal expressions of ‘basic’ emotion in speech exhibit modal tendencies in their acoustic and perceptual attributes which are largely unaffected by language or linguistic similarity.

Article

Jaywant, A., & Pell, M. D. (2009). Listener impressions of speakers with Parkinson’s disease. Journal of the International Neuropsychological Society, 16(1), 49–57.

Parkinson's disease (PD) has several negative effects on speech production and communication. However, few studies have looked at how speech patterns in PD contribute to linguistic and social impressions formed about PD patients from the perspective of listeners. In this study, discourse recordings elicited from nondemented PD speakers (n = 18) and healthy controls (n = 17) were presented to 30 listeners unaware of the speakers' disease status. In separate conditions, listeners rated the discourse samples based on their impressions of the speaker or of the linguistic content. Acoustic measures of the speech samples were analyzed for comparison with listeners' perceptual ratings. Results showed that although listeners rated the content of Parkinsonian discourse as linguistically appropriate (e.g., coherent, well-organized, easy to follow), the PD speakers were perceived as significantly less interested, less involved, less happy, and less friendly than healthy speakers. Negative social impressions demonstrated a relationship to changes in vocal intensity (loudness) and temporal characteristics (dysfluencies) of Parkinsonian speech. Our findings emphasize important psychosocial ramifications of PD that are likely to limit opportunities for communication and social interaction for those affected, because of the negative impressions drawn by listeners based on their speaking voice.

Article

Cheang, H. S., & Pell, M. D. (2009). Acoustic markers of sarcasm in Cantonese and English. The Journal of the Acoustical Society of America, 126(3), 1394–1405.

The goal of this study was to identify acoustic parameters associated with the expression of sarcasm by Cantonese speakers, and to compare the observed features to similar data on English [Cheang, H. S. and Pell, M. D. (2008). Speech Commun. 50, 366-381]. Six native Cantonese speakers produced utterances to express sarcasm, humorous irony, sincerity, and neutrality. Each utterance was analyzed to determine the mean fundamental frequency (F0), F0-range, mean amplitude, amplitude-range, speech rate, and harmonics-to-noise ratio (HNR) (to probe voice quality changes). Results showed that sarcastic utterances in Cantonese were produced with an elevated mean F0, and reductions in amplitude- and F0-range, which differentiated them most from sincere utterances. Sarcasm was also spoken with a slower speech rate and a higher HNR (i.e., less vocal noise) than the other attitudes in certain linguistic contexts. Direct Cantonese-English comparisons revealed one major distinction in the acoustic pattern for communicating sarcasm across the two languages: Cantonese speakers raised mean F0 to mark sarcasm, whereas English speakers lowered mean F0 in this context. These findings emphasize that prosody is instrumental for marking non-literal intentions in speech such as sarcasm in Cantonese as well as in other languages. However, the specific acoustic conventions for communicating sarcasm seem to vary among languages.

Article

Paulmann, S., Pell, M. D., & Kotz, S. A. (2009). Comparative processing of emotional prosody and semantics following basal ganglia infarcts: ERP evidence of selective impairments for disgust and fear. Brain Research, 1295, 159–169.

There is evidence from neuroimaging and clinical studies that functionally link the basal ganglia to emotional speech processes. However, in most previous studies, explicit tasks were administered. Thus, the underlying mechanisms substantiating emotional speech are not separated from possibly process-related task effects. Therefore, the current study tested emotional speech processing in an event-related potential (ERP) experiment using an implicit emotional processing task (probe verification). The interactive time course of emotional prosody in the context of emotional semantics was investigated using a cross-splicing method. As previously demonstrated, combined prosodic and semantic expectancy violations elicit N400-like negativities irrespective of emotional categories in healthy listeners. In contrast, basal ganglia patients show this negativity only for the emotions of happiness and anger, but not for fear or disgust. The current data serve as first evidence that lesions within the left basal ganglia affect the comparative online processing of fear and disgust prosody and semantics. Furthermore, the data imply that previously reported emotional speech recognition deficits in basal ganglia patients may be due to misaligned processing of emotional prosody and semantics.

Article

Monetta, L., Grindrod, C. M., & Pell, M. D. (2009). Irony comprehension and theory of mind deficits in patients with Parkinson's disease. Cortex, 45(8), 972–981.

Many individuals with Parkinson's disease (PD) are known to have difficulties in understanding pragmatic aspects of language. In the present study, a group of eleven non-demented PD patients and eleven healthy control (HC) participants were tested on their ability to interpret communicative intentions underlying verbal irony and lies, as well as on their ability to infer first- and second-order mental states (i.e., theory of mind). Following Winner et al. (1998), participants answered different types of questions about the events which unfolded in stories which ended in either an ironic statement or a lie. Results showed that PD patients were significantly less accurate than HC participants in assigning second-order beliefs during the story comprehension task, suggesting that the ability to make a second-order mental state attribution declines in PD. The PD patients were also less able to distinguish whether the final statement of a story should be interpreted as a joke or a lie, suggesting a failure in pragmatic interpretation abilities. The implications of frontal lobe dysfunction in PD as a source of difficulties with working memory, mental state attributions, and pragmatic language deficits are discussed in the context of these findings.

Article

Pell, M. D., Monetta, L., Paulmann, S., & Kotz, S. A. (2009). Recognizing Emotions in a Foreign Language. Journal of Nonverbal Behavior, 33(2), 107–120.

Expressions of basic emotions (joy, sadness, anger, fear, disgust) can be recognized pan-culturally from the face and it is assumed that these emotions can be recognized from a speaker’s voice, regardless of an individual’s culture or linguistic ability. Here, we compared how monolingual speakers of Argentine Spanish recognize basic emotions from pseudo-utterances (“nonsense speech”) produced in their native language and in three foreign languages (English, German, Arabic). Results indicated that vocal expressions of basic emotions could be decoded in each language condition at accuracy levels exceeding chance, although Spanish listeners performed significantly better overall in their native language (“in-group advantage”). Our findings argue that the ability to understand vocally-expressed emotions in speech is partly independent of linguistic ability and involves universal principles, although this ability is also shaped by linguistic and cultural variables.

Article

— 2008 —

Dara, C., & Pell, M. D. (2008). Effects of acoustic cue manipulations on emotional prosody recognition. The Journal of the Acoustical Society of America, 124(4_Supplement), 2497–2497.

Studies on emotion recognition from prosody have largely focused on the role and effectiveness of isolated acoustic parameters and less is known about how information from these cues is perceived and combined to infer emotional meaning. To better understand how acoustic cues influence recognition of discrete emotions from voice, this study investigated how listeners perceptually combine information from two critical acoustic cues, pitch and speech rate, to identify emotions. For all the utterances, pitch and speech rate measures of the whole utterance were independently manipulated by factors of 1.25 (+25%) and 0.75 (−25%). To examine the influence of one cue with reference to the other cue the three manipulations of pitch (+25%, 0%, and −25%) were crossed with the three manipulations of speech rate (+25%, 0%, and −25%). Pseudoutterances spoken in five emotional tones (happy, sad, angry, fear, and disgust) and neutral that have undergone acoustic cue manipulations were presented to 15 male and 15 female participants for an emotion identification task. Results indicated that both pitch and speech rate are important acoustic parameters to identify emotions and more critically, it is the relative weight of each cue which seems to contribute significantly for categorizing happy, sad, fear, and neutral.

Article

Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2008). Similarities in the acoustic expression of emotions in English, German, Hindi, and Arabic. The Journal of the Acoustical Society of America, 124(4_Supplement), 2496–2496.

Based on the hypothesis that emotion expression is in large part biologically determined (“universal”), this study examined whether spoken utterances conveying seven emotions (anger, disgust, fear, sadness, happiness, surprise, and neutral) demonstrate similar acoustic patterns in four distinct languages (English, German, Hindi, and Arabic). Emotional pseudoutterances (the dirms are in the cindabal) were recorded by four native speakers of each language using an elicitation paradigm. Across languages, approximately 2500 utterances, which were perceptually identified as communicating the intended target emotion, were analyzed for three acoustic parameters: f0Mean, f0Range, and speaking rate. Combined variance in the three acoustic measures contributed significantly to differences among the seven emotions in each language, although f0Mean played the largest role for each language. Disgust, sadness, and neutral were always produced with a low f0Mean, whereas surprise (and usually fear and anger) exhibited an elevated f0Mean. Surprise displayed an extremely wide f0Range and disgust exhibited a much slower speaking rate than the other emotions in each language. Overall, the acoustic measures demonstrated many similarities among languages consistent with the notion of universal patterns of vocal emotion expression, although certain emotions were poorly predicted by the three acoustic measures and probably rely on additional acoustic parameters for perceptual recognition.

Article

Pell, M. D., & Monetta, L. (2008). How Parkinson's Disease Affects Non‐verbal Communication and Language Processing. Language and Linguistics Compass, 2(5), 739–759.

In addition to difficulties that affect movement, many adults with Parkinson's disease (PD) experience changes that negatively impact on receptive aspects of their communication. For example, some PD patients have difficulties processing non‐verbal expressions (facial expressions, voice tone) and many are less sensitive to ‘non‐literal’ or pragmatic meanings of language, at least under certain conditions. This chapter outlines how PD can affect the comprehension of language and non‐verbal expressions and considers how these changes are related to concurrent alterations in cognition (e.g., executive functions, working memory) and motor signs associated with the disease. Our summary underscores that the progressive course of PD can interrupt a number of functional systems that support cognition and receptive language, and in different ways, leading to both primary and secondary impairments of the systems that support linguistic and non‐verbal communication.

Article

Paulmann, S., Schmidt, P. A., Pell, M. D., & Kotz, S. A. (2008). Rapid processing of emotional and voice information as evidenced by ERPs.

Next to linguistic content, the human voice carries speaker iden-tity information (e.g. female/male, young/old) and can also carry emotional information. Although various studies have started to specify the brain regions that underlie the different functions of human voice processing, few studies have aimed to specify the time course underlying these processes. By means of event-related potentials (ERPs) we aimed to determine the time-course of neural responses to emotional speech, speaker identification, and their interplay. While engaged in an implicit voice processing task (probe verification) participants listened to emotional sentences spoken by two female and two male speakers of two different ages (young and middle-aged). For all four speakers rapid emotional decoding was observed as emo-tional sentences could be differentiated from neutral sentences already within 200 ms after sentence onset (P200). However, results also imply that individual capacity to encode emotional expressions may have an influence on this early emotion detec-tion as the P200 differentiation pattern (neutral vs. emotion) differed for each individual speaker. 1.

Article

Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). Functional contributions of the basal ganglia to emotional prosody: Evidence from ERPs. Brain Research, 1217, 171–178.

The basal ganglia (BG) have been functionally linked to emotional processing [Pell, M.D., Leonard, C.L., 2003. Processing emotional tone form speech in Parkinson's Disease: a role for the basal ganglia. Cogn. Affec. Behav. Neurosci. 3, 275-288; Pell, M.D., 2006. Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang. 97 (2), 221-234]. However, few studies have tried to specify the precise role of the BG during emotional prosodic processing. Therefore, the current study examined deviance detection in healthy listeners and patients with left focal BG lesions during implicit emotional prosodic processing in an event-related brain potential (ERP)-experiment. In order to compare these ERP responses with explicit judgments of emotional prosody, the same participants were tested in a follow-up recognition task. As previously reported [Kotz, S.A., Paulmann, S., 2007. When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res. 1151, 107-118; Paulmann, S. & Kotz, S.A., 2008. An ERP investigation on the temporal dynamics of emotional prosody and emotional semantics in pseudo- and lexical sentence context. Brain Lang. 105, 59-69], deviance of prosodic expectancy elicits a right lateralized positive ERP component in healthy listeners. Here we report a similar positive ERP correlate in BG-patients and healthy controls. In contrast, BG-patients are significantly impaired in explicit recognition of emotional prosody when compared to healthy controls. The current data serve as first evidence that focal lesions in left BG do not necessarily affect implicit emotional prosodic processing but evaluative emotional prosodic processes as demonstrated in the recognition task. The results suggest that the BG may not play a mandatory role in implicit emotional prosodic processing. Rather, executive processes underlying the recognition task may be dysfunctional during emotional prosodic processing.

Article

Pell, M. D., & Skorup, V. (2008). Implicit processing of emotional prosody in a foreign versus native language. Speech Communication, 50(6), 519–530.

To test ideas about the universality and time course of vocal emotion processing, 50 English listeners performed an emotional priming task to determine whether they implicitly recognize emotional meanings of prosody when exposed to a foreign language. Arabic pseudo-utterances produced in a happy, sad, or neutral prosody acted as primes for a happy, sad, or ‘false’ (i.e., non-emotional) face target and participants judged whether the facial expression represents an emotion. The prosody-face relationship (congruent, incongruent) and the prosody duration (600 or 1000 ms) were independently manipulated in the same experiment. Results indicated that English listeners automatically detect the emotional significance of prosody when expressed in a foreign language, although activation of emotional meanings in a foreign language may require increased exposure to prosodic information than when listening to the native language.

Article

Monetta, L., Cheang, H. S., & Pell, M. D. (2008). Understanding speaker attitudes from prosody by adults with Parkinson's disease. Journal of Neuropsychology, 2(2), 415–430.

The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).

Article

— 2007 —

Monetta, L., Grindrod, C. M., & Pell, M. D. (2007). Effects of working memory capacity on inference generation during story comprehension in adults with Parkinson's disease. Journal of Neurolinguistics, 21(5), 400–417.

A group of non-demented adults with Parkinson's disease (PD) were studied to investigate how PD affects pragmatic-language processing, and, specifically, to test the hypothesis that the ability to draw inferences from discourse in PD is critically tied to the underlying working memory (WM) capacity of individual patients [Monetta, L., & Pell, M. D. (2007). Effects of verbal working memory deficits on metaphor comprehension in patients with Parkinson's disease. Brain and Language, 101, 80–89]. Thirteen PD patients and a matched group of 16 healthy control (HC) participants performed the Discourse Comprehension Test [Brookshire, R. H., & Nicholas, L. E. (1993). Discourse comprehension test. Tucson, AZ: Communication Skill Builders], a standardized test which evaluates the ability to generate inferences based on explicit or implied information relating to main ideas or details presented in short stories. Initial analyses revealed that the PD group as a whole was significantly less accurate than the HC group when comprehension questions pertained to implied as opposed to explicit information in the stories, consistent with previous findings [Murray, L. L., & Stout, J. C. (1999). Discourse comprehension in Huntington's and Parkinson's diseases. American Journal of Speech–Language Pathology, 8, 137–148]. However, subsequent analyses showed that only a subgroup of PD patients with WM deficits, and not PD patients with WM capacity within the control group range, were significantly impaired for drawing inferences (especially predictive inferences about implied details in the stories) when compared to the control group. These results build on a growing body of literature, which demonstrates that compromise of frontal–striatal systems and subsequent reductions in processing/WM capacity in PD are a major source of pragmatic-language deficits in many PD patients.

Article

Cheang, H. S., & Pell, M. D. (2007). The sound of sarcasm. Speech Communication, 50(5), 366–381.

The present study was conducted to identify possible acoustic cues of sarcasm. Native English speakers produced a variety of simple utterances to convey four different attitudes: sarcasm, humour, sincerity, and neutrality. Following validation by a separate naïve group of native English speakers, the recorded speech was subjected to acoustic analyses for the following features: mean fundamental frequency (F0), F0 standard deviation, F0 range, mean amplitude, amplitude range, speech rate, harmonics-to-noise ratio (HNR, to probe for voice quality changes), and one-third octave spectral values (to probe resonance changes). The results of analyses indicated that sarcasm was reliably characterized by a number of prosodic cues, although one acoustic feature appeared particularly robust in sarcastic utterances: overall reductions in mean F0 relative to all other target attitudes. Sarcasm was also reliably distinguished from sincerity by overall reductions in HNR and in F0 standard deviation. In certain linguistic contexts, sarcasm could be differentiated from sincerity and humour through changes in resonance and reductions in both speech rate and F0 range. Results also suggested a role of language used by speakers in conveying sarcasm and sincerity. It was concluded that sarcasm in speech can be characterized by a specific pattern of prosodic cues in addition to textual cues, and that these acoustic characteristics can be influenced by language used by the speaker.

Article

Dara, C., Monetta, L., & Pell, M. D. (2007). Vocal emotion processing in Parkinson's disease: Reduced sensitivity to negative emotions. Brain Research, 1188, 100–111.

To document the impact of Parkinson's disease (PD) on communication and to further clarify the role of the basal ganglia in the processing of emotional speech prosody, this investigation compared how PD patients identify basic emotions from prosody and judge specific affective properties of the same vocal stimuli, such as valence or intensity. Sixteen non-demented adults with PD and 17 healthy control (HC) participants listened to semantically-anomalous pseudo-utterances spoken in seven emotional intonations (anger, disgust, fear, sadness, happiness, pleasant surprise, neutral) and two distinct levels of perceived emotional intensity (high, low). On three separate occasions, participants classified the emotional meaning of the prosody for each utterance (identification task), rated how positive or negative the stimulus sounded (valence rating task), or rated how intense the emotion was expressed by the speaker (intensity rating task). Results indicated that the PD group was significantly impaired relative to the HC group for categorizing emotional prosody and showed a reduced sensitivity to valence, but not intensity, attributes of emotional expressions conveying anger, disgust, and fear. The findings are discussed in light of the possible role of the basal ganglia in the processing of discrete emotions, particularly those associated with negative vigilance, and of how PD may impact on the sequential processing of prosodic expressions.

Article

Paulmann, S., Pell, M. D., & Kotz, S. A. (2007). How aging affects the recognition of emotional speech. Brain and Language, 104(3), 262–269.

To successfully infer a speaker's emotional state, diverse sources of emotional information need to be decoded. The present study explored to what extent emotional speech recognition of 'basic' emotions (anger, disgust, fear, happiness, pleasant surprise, sadness) differs between different sex (male/female) and age (young/middle-aged) groups in a behavioural experiment. Participants were asked to identify the emotional prosody of a sentence as accurately as possible. As a secondary goal, the perceptual findings were examined in relation to acoustic properties of the sentences presented. Findings indicate that emotion recognition rates differ between the different categories tested and that these patterns varied significantly as a function of age, but not of sex.

Article

Berney, A., Panisset, M., Sadikot, A. F., Ptito, A., Dagher, A., Fraraccio, M., Savard, G., Pell, M. D., & Benkelfat, C. (2007). Mood stability during acute stimulator challenge in Parkinson's disease patients under long‐term treatment with subthalamic deep brain stimulation. Movement Disorders, 22(8), 1093–1096.

Acute and chronic behavioral effects of subthalamic stimulation (STN-DBS) for Parkinson's disease (PD) are reported in the literature. As the technique is relatively new, few systematic studies on the behavioral effects in long-term treated patients are available. To further study the putative effects of STN-DBS on mood and emotional processing, 15 consecutive PD patients under STN-DBS for at least 1 year, were tested ON and OFF stimulation while on or off medication, with instruments sensitive to short-term changes in mood and in emotional discrimination. After acute changes in experimental conditions, mood core dimensions (depression, elation, anxiety) and emotion discrimination processing remained remarkably stable, in the face of significant motor changes. Acute stimulator challenge in long-term STN-DBS-treated PD patients does not appear to provoke clinically relevant mood effects.

Article

— 2006 —

Pell, M. D. (2006). Reduced sensitivity to prosodic attitudes in adults with focal right hemisphere brain damage. Brain and Language, 101(1), 64–79.

Although there is a strong link between the right hemisphere and understanding emotional prosody in speech, there are few data on how the right hemisphere is implicated for understanding the emotive "attitudes" of a speaker from prosody. This report describes two experiments which compared how listeners with and without focal right hemisphere damage (RHD) rate speaker attitudes of "confidence" and "politeness" which are signalled in large part by prosodic features of an utterance. The RHD listeners displayed abnormal sensitivity to both the expressed confidence and politeness of speakers, underscoring a major role for the right hemisphere in the processing of emotions and speaker attitudes from prosody, although the source of these deficits may sometimes vary.

Article

Pell, M. D., Alasseri, A., Kotz, S. A., & Paulmann, S. (2006). The voices of anger and disgust: Acoustic correlates in three languages. The Journal of the Acoustical Society of America, 120(5_Supplement), 3093–3093.

Anger and disgust are believed to represent discrete human emotions with unique vocal signatures in spoken language. However, few investigations describe the acoustic dimensions of these vocal expressions, and some researchers have questioned whether disgust is robustly encoded in the vocal channel. This study sought to isolate acoustic parameters that differentiate utterances identified as sounding angry versus disgusted by listeners based on evidence from three separate languages: English, German, and Arabic. Two male and two female speakers of each language produced a list of pseudo-sentences (e.g., Suh fector egzullin tuh boshent) to convey a set of seven different emotions. The recordings were later judged by a group of native listeners to determine what emotional meaning was perceived from the prosodic features of each pseudo-utterance. Individual sentences identified systematically as conveying either anger or disgust (greater than 3× chance target recognition) were then analyzed acoustically for various parameters of fundamental frequency, amplitude, and duration. Analyses compared which acoustic parameter(s) were dominant for identifying anger versus disgust in each language, and whether these patterns appeared to vary across languages, with implications for understanding the specificity and universality of these emotion expressions in the vocal channel.

Article

Cheang, H. S., & Pell, M. D. (2006). An acoustic investigation of Parkinsonian speech in linguistic and emotional contexts. Journal of Neurolinguistics, 20(3), 221–241.

The speech prosody of a group of patients in the early stages of Parkinson's disease (PD) was compared to that of a group of healthy age- and education-matched controls to quantify possible acoustic changes in speech production secondary to PD. Both groups produced standardized speech samples across a number of prosody conditions: phonemic stress, contrastive stress, and emotional prosody. The amplitude, fundamental frequency, and duration of all tokens were measured. PD speakers produced speech that was of lower amplitude than the tokens of healthy speakers in many conditions across all production tasks. Fundamental frequency distinguished the two speaker groups for contrastive stress and emotional prosody production, and duration differentiated the groups for phonemic stress production. It was concluded that motor impairments in PD lead to adverse and varied acoustic changes which affect a number of prosodic contrasts in speech and that these alterations appear to occur in earlier stages of disease progression than is often presumed by many investigators.

Article

Dara, C., & Pell, M. D. (2006). Effects of right-hemisphere damage on explicit and implicit processing of emotional prosody. Brain and Language, 99(1-2), 51–52.

Numerous investigations substantiate a critical role for the right hemisphere for decoding the emotional significance of speech prosody. Until recently, most neuropsychological studies have employed “off-line” paradigms, such as identification and discrimination tasks, to make claims about cerebral mechanisms for processing emotional prosody. These explicit judgments require listeners to consciously appraise prosodic stimuli and match activated knowledge with a verbal emotion label, which requires high executive task demands. These approaches used in isolation may sometimes fail to elaborate whether brain-damaged adults, especially those with right hemisphere damage (RHD), experience a deficit at the level of prosodic processing or due to the demands of associating prosody and verbal information in memory. To address this problem, the current study employed the Facial Affect Decision Task (FADT, Pell, 2005a) to analyze the effects of RHD on emotional meaning activations from prosody. The FADT measures emotion congruency effects (“emotion priming”) during processing of a conjoined prosody and face stimulus, allowing inferences to be made about whether the emotional meanings of prosody are being activated during on-line speech processing. Our studies of young healthy participants demonstrate that prosodic information reliably primes judgments about facial expressions only when both prosody and face belong to the same emotion category (Pell, 2005a, Pell, 2005b). The present study adopted this procedure to evaluate the performance of RHD participants on a relatively implicit task of processing emotional prosody and on traditional explicit tasks of labeling emotional prosody or emotional faces in a force-choice format.

Article

Monetta, L., Grindrod, C. M., & Pell, M. D. (2006). Inference generation ability during story comprehension in adults with Parkinson’s disease. Brain and Language, 99(1-2), 118–119.

Parkinson’s disease (PD) is a chronic degenerative disorder linked to decreased dopamine production in the basal ganglia and primarily recognized by its motor symptoms. However, recent findings have also shown that many PD patients with cognitive impairments exhibit language difficulties (see Berg, Bjornram, Hartelius, Laakso, & Johnels, 2003). Among these language difficulties, pragmatic communication abilities such as the capacity to draw inferences seem to be impaired in PD patients (Berg et al., 2003). In the present study, the ability of PD patients to draw different types of linguistic inferences was evaluated to characterize the communication skills of individuals with PD in greater detail, and as a model to infer whether disorders of the fronto-striatal circuitry are reliably associated with pragmatic language deficits. In particular, the ability to generate inferences in reference to the working memory (WM) capacity of individual PD patients was investigated to better understand the source of pragmatic language deficits in this population.

Article

Monetta, L., & Pell, M. D. (2006). Effects of verbal working memory deficits on metaphor comprehension in patients with Parkinson’s disease. Brain and Language, 101(1), 80–89.

This research studied one aspect of pragmatic language processing, the ability to understand metaphorical language, to determine whether patients with Parkinson disease (PD) are impaired for these abilities, and whether cognitive resource limitations/fronto-striatal dysfunction contributes to these deficits. Seventeen PD participants and healthy controls (HC) completed a series of neuropsychological tests and performed a metaphor comprehension task following the methods of Gernsbacher and colleagues [Gernsbacher, M. A., Keysar, B., Robertson, R. R. W., & Werner, N. K. (2001). The role of suppression and enhancement in understanding metaphors. Journal of Memory and Language, 45, 433-450.] When participants in the PD group were identified as "impaired" or "unimpaired" relative to the control group on a measure of verbal working memory span, we found that only PD participants with impaired working memory were simultaneously impaired in the processing of metaphorical language. Based on our findings we argue that certain "complex" forms of language processing such as metaphor interpretation are highly dependent on intact fronto-striatal systems for working memory which are frequently, although not always, compromised during the early course of PD.

Article

Pell, M. D. (2006). Implicit recognition of vocal emotions in native and non-native speech.

There is evidence for both cultural-specificity and 'universality' in how listeners recognize vocal expressions of emotion from speech. This paper summarizes some of the early findings using the Facial Affect Decision Task We provide evidence that English listeners register the emotional meanings of prosody when processing sentences spoken by native (English) as well as non-native (Arabic) speakers who encoded vocal emotions in a culturallyappropriate manner. As well, we discuss the timecourse for activating emotion-related knowledge in a native and nonnative language which may differ due to cultural influences on vocal emotion expression.

Article

Dara, C., & Pell, M. D. (2006). The interaction of linguistic and affective prosody in a tone language. The Journal of the Acoustical Society of America, 119(5_Supplement), 3303–3304.

To address how a common set of acoustic properties of speech prosody modulate to convey linguistic and affective meanings concurrently, this study investigated the influence of phonemic tones on the expression of emotion (happy, sad, angry) and linguistic modality (declarative, interrogative) in a tone language, Punjabi. Base stimuli consisted of neutral sentences with either of the three contrastive tones in Punjabi (falling, level, high-rising) varying at the object position of the sentence only. Each of these sentences was elicited as a statement or a question for each of the three emotions by 6 native Punjabi speakers. Utterances were validated for adequate representation of the target emotion, and reliable exemplars for each of the emotions and linguistic modality were subjected to detailed acoustic analysis. Fundamental frequency and duration measures were analyzed for the stressed vowel of the keywords (subject, object, verb) and the whole utterance for each of the tokens. Results demonstrated ways that the acoustic correlates of tone at the word level interact with that of the intonation and emotion at the sentence level, which were further compared with previous results on the effect of focus on intonation and emotional prosody [M. D. Pell, J. Acoust. Soc. Am. 109, 1668–1680 (2001)].

Article

Alasseri, A., & Pell, M. D. (2006). Influence of Emotionality on Discourse Production in Aphasia: A Stimulus Validation Study. The Aphasiology Archive (University of Pittsburgh).

The purpose of this study was to validate experimental video stimuli designed to examine the influence of emotionality on discourse produced by Arabic speakers with aphasia. Ten healthy Arabic speakers described events depicted in 15 videos within three emotional categories: positive, negative, and neutral. Participants responded to a questionnaire judging the videos on three emotion dimensions, logical sequence, and interest. Elicited discourse was analyzed for content units. Results of the questionnaire and content unit analysis were instrumental in selecting a subset of nine videos used in investigating discourse production in aphasia. Results also predicted performance in the aphasia study.

Cheang, H. S., & Pell, M. D. (2006). A study of humour and communicative intention following right hemisphere stroke. Clinical Linguistics & Phonetics, 20(6), 447–462.

This research provides further data regarding non-literal language comprehension following right hemisphere damage (RHD). To assess the impact of RHD on the processing of non-literal language, ten participants presenting with RHD and ten matched healthy control participants were administered tasks tapping humour appreciation and pragmatic interpretation of non-literal language. Although the RHD participants exhibited a relatively intact ability to interpret humour from jokes, their use of pragmatic knowledge about interpersonal relationships in discourse was significantly reduced, leading to abnormalities in their understanding of communicative intentions (CI). Results imply that explicitly detailing CI in discourse facilitates RHD participants' comprehension of non-literal language.

Article

— 2005 —

Pell, M. D. (2005). Prosody–face Interactions in Emotional Processing as Revealed by the Facial Affect Decision Task. Journal of Nonverbal Behavior, 29(4), 193–215.

Previous research employing the facial affect decision task (FADT) indicates that when listeners are exposed to semantically anomalous utterances produced in different emotional tones (prosody), the emotional meaning of the prosody primes decisions about an emotionally congruent rather than incongruent facial expression (Pell, M. D., Journal of Nonverbal Behavior, 29, 45–73). This study undertook further development of the FADT by investigating the approximate timecourse of prosody–face interactions in nonverbal emotion processing. Participants executed facial affect decisions about happy and sad face targets after listening to utterance fragments produced in an emotionally related, unrelated, or neutral prosody, cut to 300, 600, or 1000 ms in duration. Results underscored that prosodic information enduring at least 600 ms was necessary to presumably activate shared emotion knowledge responsible for prosody–face congruity effects.

Article

Pell, M. D., Cheang, H. S., & Leonard, C. M. (2005). The impact of Parkinson’s disease on vocal-prosodic communication from the perspective of listeners. Brain and Language, 97(2), 123–134.

An expressive disturbance of speech prosody has long been associated with idiopathic Parkinson's disease (PD), but little is known about the impact of dysprosody on vocal-prosodic communication from the perspective of listeners. Recordings of healthy adults (n=12) and adults with mild to moderate PD (n=21) were elicited in four speech contexts in which prosody serves a primary function in linguistic or emotive communication (phonemic stress, contrastive stress, sentence mode, and emotional prosody). Twenty independent listeners naive to the disease status of individual speakers then judged the intended meanings conveyed by prosody for tokens recorded in each condition. Findings indicated that PD speakers were less successful at communicating stress distinctions, especially words produced with contrastive stress, which were identifiable to listeners. Listeners were also significantly less able to detect intended emotional qualities of Parkinsonian speech, especially for anger and disgust. Emotional expressions that were correctly recognized by listeners were consistently rated as less intense for the PD group. Utterances produced by PD speakers were frequently characterized as sounding sad or devoid of emotion entirely (neutral). Results argue that motor limitations on the vocal apparatus in PD produce serious and early negative repercussions on communication through prosody, which diminish the social-linguistic competence of Parkinsonian adults as judged by listeners.

Article

Pell, M. D. (2005). Effects of cortical and subcortical brain damage on the processing of emotional prosody.

Cortical and subcortical contributions to the processing of emotional speech prosody were evaluated by testing adults with single focal lesions involving the right hemisphere (n=9), adults with basal ganglia damage in idiopathic Parkinson's disease (n=21), and healthy aging adults (n=33). Participants listened to semantically-anomalous utterances in two conditions (identification, rating) which assessed their recognition of five prosodic emotions. Findings confirmed that both right hemisphere and basal ganglia pathology were associated with impaired comprehension of prosody, although possibly for distinct reasons: right hemisphere compromise produced a more pervasive insensitivity to emotive features of prosodic stimuli, whereas basal ganglia disease produced a milder and more quantitative impairment on these tasks. The implications of these findings for differentiating cortical and subcortical mechanisms involved in prosody processing are considered.

Article

Paulmann, S., Pell, M. D., & Kotz, S. A. (2005). Emotional prosody recognition in BG-patients: Disgust recognition revisited. Brain and Language, 95(1), 143–144.

Disorders of emotional processing (i.e., perception and production of emotions) often restrict daily-life communication of patients. One issue in lesion studies is that emotions are rather complex brain functions and that it is difficult to separate all underlying processes which constitute an emotion. Over the last years, literature has suggested that the basal ganglia (BG), next to right hemispheric cortical structures, play an important role in the evaluation of emotional stimuli. In particular, there has been evidence that the BG modulate perception of disgust, as patients with Parkinson’s disease (Pell & Leonard, 2003) or Huntington’s disease (Sprengelmeyer et al., 1996) display deficits in the recognition of facial expressions as well as in vocal cues of disgust. For example, a single case study by Calder, Keane, Manes, Antoun, and Young (2000) showed that a BG patient suffered from impairment in the recognition of disgust in facial expressions as well as in vocal cues. Even though it is controversially discussed, there is also evidence that the BG are involved in the recognition of fear in facial expressions but not in vocal realizations (Kan, Kawamura, & Hasegawa, 2002). A recent study suggests that, depending on the lesion extent, BG patients suffer also from impairment in recognizing anger (Calder, Keane, Lawrence, & Manes, 2004). That the BG might be involved in processing positive and negative emotions from vocal cues has also been shown in an fMRI study with healthy participants (Kotz, Meyer, & Alter, 2003). It should be noted, though, that in order to specify whether one particular emotional expression (facial and/or vocal) is correlated with one brain structure, more than one emotion needs to be tested. Here, we tested the recognition of emotional prosody in BG lesion patients using four emotions, namely anger, fear, disgust, and happiness, and a neutral baseline. Prosodic categorization was used to investigate whether different emotions can be identified correctly by this patient group.

Article

Pell, M. D. (2005). Cerebral mechanisms for understanding emotional prosody in speech. Brain and Language, 96(2), 221–234.

Hemispheric contributions to the processing of emotional speech prosody were investigated by comparing adults with a focal lesion involving the right (n = 9) or left (n = 11) hemisphere and adults without brain damage (n = 12). Participants listened to semantically anomalous utterances in three conditions (discrimination, identification, and rating) which assessed their recognition of five prosodic emotions under the influence of different task- and response-selection demands. Findings revealed that right- and left-hemispheric lesions were associated with impaired comprehension of prosody, although possibly for distinct reasons: right-hemisphere compromise produced a more pervasive insensitivity to emotive features of prosodic stimuli, whereas left-hemisphere damage yielded greater difficulties interpreting prosodic representations as a code embedded with language content.

Article

Pell, M. D. (2005). Nonverbal Emotion Priming: Evidence from the ?Facial Affect Decision Task?. Journal of Nonverbal Behavior, 29(1), 45–73.

Affective associations between a speaker’s voice (emotional prosody) and a facial expression were investigated using a new on-line procedure, the Facial Affect Decision Task (FADT). Faces depicting one of four ‘basic’ emotions were paired with utterances conveying an emotionally-related or unrelated prosody, followed by a yes/no judgement of the face as a ‘true’ exemplar of emotion. Results established that prosodic characteristics facilitate the accuracy and speed of decisions about an emotionally congruent target face, supplying empirical support for the idea that information about discrete emotions is shared across major nonverbal channels. The FADT represents a promising tool for future on-line studies of nonverbal processing in both healthy and disordered individuals.

Article

Pell, M. D., & Leonard, C. L. (2005). Facial expression decoding in early Parkinson's disease. Cognitive Brain Research, 23(2-3), 327–340.

The ability to derive emotional and non-emotional information from unfamiliar, static faces was evaluated in 21 adults with idiopathic Parkinson's disease (PD) and 21 healthy control subjects. Participants' sensitivity to emotional expressions was comprehensively assessed in tasks of discrimination, identification, and rating of five basic emotions: happiness, (pleasant) surprise, anger, disgust, and sadness. Subjects also discriminated and identified faces according to underlying phonemic ("facial speech") cues and completed a neuropsychological test battery. Results uncovered limited evidence that the processing of emotional faces differed between the two groups in our various conditions, adding to recent arguments that these skills are frequently intact in non-demented adults with PD [R. Adolphs, R. Schul, D. Tranel, Intact recognition of facial emotion in Parkinson's disease, Neuropsychology 12 (1998) 253-258]. Patients could also accurately interpret facial speech cues and discriminate the identity of unfamiliar faces in a normal manner. There were some indications that basal ganglia pathology in PD contributed to selective difficulties recognizing facial expressions of disgust, consistent with a growing literature on this topic. Collectively, findings argue that abnormalities for face processing are not a consistent or generalized feature of medicated adults with mild-moderate PD, prompting discussion of issues that may be contributing to heterogeneity within this literature. Our results imply a more limited role for the basal ganglia in the processing of emotion from static faces relative to speech prosody, for which the same PD patients exhibited pronounced deficits in a parallel set of tasks [M.D. Pell, C. Leonard, Processing emotional tone from speech in Parkinson's disease: a role for the basal ganglia, Cogn. Affect. Behav. Neurosci. 3 (2003) 275-288]. These diverging patterns allow for the possibility that basal ganglia mechanisms are more engaged by temporally-encoded social information derived from cue sequences over time.

Article

— 2004 —

Pell, M. D. (2004). A method for on-line evaluation of emotional prosody in healthy and disordered populations. Brain and Language, 91(1), 25–26.

The capacity for humans to assign emotional significance to pro-sodic variations in speech has been studied formally for only the pastthree decades. During this time, much progress has been made indescribing cognitive-brain mechanisms devoted to the perception andsocial analysis of emotional prosody, highlighting a critical role for theright cerebral hemisphere within a distributed brain network (c.g., Pell,1998). Until recently, ideas about the comprehension of emotionalprosody have been derived almost exclusively from behaviouralinvestigations which elicited off-line (i.c., discrimination or categori-zation) judgements after listeners were presented an emotionally-intoned utterance. This approach, while extremely valuable, placesimportant executive-task demands on brain-damaged listeners,questioning the origin of prosodic breakdown in certain clinical pop-ulations. In addition, off-line categorization or evaluative judgementsof emotional prosody do not capture the array of affective andemotional features that are associated with these events and activatedduring on-line speech analysis. These concerns call for new on-linemethods to evaluate social-interpretative processes applied to emo-tional prosody during the actual course of processing these expressionsby listeners. Recently, Pell (in press) adapted principles of the cross-modallexical decision task (Swinney et al., 1979) to develop the Facial AffectDecision Task (FADT), a semantic priming paradigm capable of de-tecting underlying meaning activations for emotional prosody on re-sponses to an emotionally related or unrelated face. This reportsupplied evidence that the emotional value of a prosodic prime facil-itates emotional "acceptability" judgements about a conjoined facialexpression in accuracy and speed only when the prosody-face eventsare related by emotion; these priming effects exemplify that emotionalattributes of prosodic information were indeed registered in memoryby listeners, a phenomenon which may be usefully applied to futureon-line studies of emotional prosody comprehension in brain-damagedlisteners. First, however, we sought further evidence that would un-derscore the sensitivity of the FADT for tapping underlying associa-tions of emotional prosody in healthy adults, promoting the reliabilityof this technique for use in clinical populations.

Article

Cheang, H. S., & Pell, M. D. (2004). The effects of Parkinson’s disease on the production of contrastive stress. The Journal of the Acoustical Society of America, 115(5_Supplement), 2424–2424.

Reduced speech intelligibility has been observed clinically among patients with Parkinson’s disease (PD); one possible contributor to these problems is that motor limitations in PD reduce the ability to mark linguistic contrasts in speech using prosodic cues. This study compared acoustic aspects of the production of contrastive stress (CS) in sentences that were elicited from ten subjects with PD and ten matched control subjects without neurological impairment. Subjects responded to questions that biased them to put emphasis on the first, middle, or last word of target utterances. The mean vowel duration and mean fundamental frequency (F0) of each keyword were then measured, normalized, and analyzed for possible differences in the acoustic cues provided by each group to signal emphatic stress. Both groups demonstrated systematic differences in vowel lengthening between emphasized and unemphasized words across word positions; however, controls were more reliable than PD subjects at modulating the F0 of emphasized words to signal its location in the utterance. Group differences in the F0 measures suggest one possible source of the impoverished intelligibility of Parkinsonian speech and will be investigated in a subsequent study that looks at the direct impact of these changes on emphasis perception by listeners. [Work supported by CIHR.]

Article

— 2003 —

Pell, M. D., & Leonard, C. L. (2003). Processing emotional tone from speech in Parkinson’s disease: A role for the basal ganglia. Cognitive Affective & Behavioral Neuroscience, 3(4), 275–288.

In this study, individuals with Parkinson's disease were tested as a model for basal ganglia dysfunction to infer how these structures contribute to the processing of emotional speech tone (emotional prosody). Nondemented individuals with and without Parkinson's disease (n = 21/group) completed neuropsychological tests and tasks that required them to process the meaning of emotional prosody in various ways (discrimination, identification, emotional feature rating). Individuals with basal ganglia disease exhibited abnormally reduced sensitivity to the emotional significance of prosody in a range of contexts, a deficit that could not be attributed to changes in mood, emotional-symbolic processing, or estimated frontal lobe cognitive resource limitations in most conditions. On the basis of these and broader findings in the literature, it is argued that the basal ganglia provide a critical mechanism for reinforcing the behavioral significance of prosodic patterns and other temporal representations derived from cue sequences (Lieberman, 2000), facilitating cortical elaboration of these events.

Article

Pell, M. D., & Long, T. (2003). Interpretation of prosodic indicators of speaker confidence following right hemisphere damage. Brain and Language, 87(1), 204–205.

Vocal-prosodic attributes of speech expose listeners to a wide range of affective and cognitive dispositions held by the speaker beyond those captured by the well-studied ‘basic’ emotion states such as happiness or anger. According to a pragmatic model of emotive communication (Caffi & Janney, 1994), one of the pragmatic functions served by emotive prosody is to signal ‘evidentiality’ or the degree of confidence of the speaker in what he/she is saying; prosodic cues that serve this emotive device regulate the inferrable correctness, authority, or truth value of the propositional content along a perceived continuum of confident-doubtful. Acoustic-perceptual parameters that signal distinctions along this continuum have received little attention but are likely to include increased loudness, rapid speech rate, shorter/less frequent pause, and a terminal downturn in the intonation contour of utterances perceived as more ‘‘confident’’ (e.g., Kimble & Seidel, 1991). Individuals with focal lesions of the right hemisphere (RHD) are known to present with various disturbances in their communication that impact negatively on the interpretation of emotional aspects of prosody (Pell, 1998), but which likely extend to a wider range of pragmatic failures that impair understanding of the nonliteral, implied, or intended meanings of discourse (see Sabbagh, 1999). Within a social-pragmatic view of how prosodic cues guide discourse interpretation, the current study compared how individuals with and without RHD infer the relative confidence of speakers based on available prosodic or linguistic markers of ‘evidentiality’ in spoken utterances.

Article

— 2002 —

Pell, M. D. (2002). Acoustic profiles of negative emotion. The Journal of the Acoustical Society of America, 112(5_Supplement), 2443–2444.

A study was initiated to acoustically characterize and differentiate discrete categories of negatively valenced emotions conveyed through speech prosody. Utterances elicited from eight encoders (actors) in different emotional tones were perceptually rated by a group of decoders to gauge how strongly each token was associated with the basic emotions of ‘‘anger,’’ ‘‘disgust,’’ and ‘‘sadness’’ using a seven-choice response paradigm. Tokens rated as highly representative of each target emotion by greater than 80% of decoders were examined acoustically. Measures of fundamental frequency (mean, range, sd), amplitude (mean, range, sd), and duration (speech rate, %voiced) were obtained from each token and for utterances spoken in a ‘‘neutral’’ tone by the same encoders. Normalized measures were compared among emotional categories to uncover reliable acoustic dimensions that may have contributed to perceptually distinct vocal symbols of negative emotion states. Results pointed to important differences in duration, amplitude, and especially fundamental frequency in discriminating among prosodic signals representing distinct negative emotions. These findings extend work on the acoustic underpinnings of positive and negative vocalizations in speech [M. D. Pell, J. Acoust. Soc. Am. 109, 1668–1680 (2001)], providing finer specification of these parameters within the family of ‘‘negative’’ emotions. [Work supported by NSERC.]

Article

Pell, M. D. (2002). Surveying emotional prosody in the brain.

Re#e#qkq has long supporte# a pivotal righthe#tk()1I0 contribution tothe de##OqW( ofe#kO2qO5k prosody, although a broade# ne#ade# of cortical and subcorticalstructure# is now thought to support diffe#tkO compone#(I of this functional syste# during inputproce#)Iq5k This pape# highlights important work implicatingthe basal ganglia ine#k(W)O)k prosody de#sodykI e#odykI521 inre#I((qk4WI ke# affe#((qk stimulus prope#skI2 ne#pe#skI forhighe#2)k4W2 inte##2)k4W201 proce##2)k The role ofthe righthe#tkOIOI5 ine#()I0qk4W5 e#Ike#0qk4W5I0Ike# stimuli isthe# conside#4W inre#()522k to pre#(IWq `functional' and `auditory-pe#)Wke#4WW capacitie# of constitue#W re#stitu Abroade# de#de#k4W)I ofthe right he#htk(IWq)k jurisdiction in social-e#W(22q be#cial-e is advocate# toadvance future work in this are#k and ane#paradigmtotapon-line compre#0k4Wq2 ofe#k(O(15k prosody in clinical populations is de#W)2Ok4) 1.

Article

Pell, M. D. (2002). Evaluation of Nonverbal Emotion in Face and Voice: Some Preliminary Findings on a New Battery of Tests. Brain and Cognition, 48(2-3), 499–504.

This report describes some preliminary attributes of stimuli developed for future evaluation of nonverbal emotion in neurological populations with acquired communication impairments. Facial and vocal exemplars of six target emotions were elicited from four male and four female encoders and then prejudged by 10 young decoders to establish the category membership of each item at an acceptable consensus level. Representative stimuli were then presented to 16 additional decoders to gather indices of how category membership and encoder gender influenced recognition accuracy of emotional meanings in each nonverbal channel. Initial findings pointed to greater facility in recognizing target emotions from facial than vocal stimuli overall and revealed significant accuracy differences among the six emotions in both the vocal and facial channels. The gender of the encoder portraying emotional expressions was also a significant factor in how well decoders recognized specific emotions (disgust, neutral), but only in the facial condition.

Article

— 2001 —

Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. The Journal of the Acoustical Society of America, 109(4), 1668–1680.

Preliminary data were collected on how emotional qualities of the voice (sad, happy, angry) influence the acoustic underpinnings of neutral sentences varying in location of intra-sentential focus (initial, final, no) and utterance "modality" (statement, question). Short (six syllable) and long (ten syllable) utterances exhibiting varying combinations of emotion, focus, and modality characteristics were analyzed for eight elderly speakers following administration of a controlled elicitation paradigm (story completion) and a speaker evaluation procedure. Duration and fundamental frequency (f0) parameters of recordings were scrutinized for "keyword" vowels within each token and for whole utterances. Results generally re-affirmed past accounts of how duration and f0 are encoded on key content words to mark linguistic focus in affectively neutral statements and questions for English. Acoustic data on three "global" parameters of the stimuli (speech rate, mean f0, f0 range) were also largely supportive of previous descriptions of how happy, sad, angry, and neutral utterances are differentiated in the speech signal. Important interactions between emotional and linguistic properties of the utterances emerged which were predominantly (although not exclusively) tied to the modulation of f0; speakers were notably constrained in conditions which required them to manipulate f0 parameters to express emotional and nonemotional intentions conjointly. Sentence length also had a meaningful impact on some of the measures gathered.

Article

Leonard, C. L., Baum, S. R., & Pell, M. D. (2001). The Effect of Compressed Speech on the Ability of Right-Hemisphere-Damaged Patients to Use Context. Cortex, 37(3), 327–344.

The ability of RHD patients to use context under conditions of increased processing demands was examined. Subjects monitored for words in auditorily presented sentences of three context types-normal, semantically anomalous, and random, at three rates of speech normal, 70% compressed (Experiment 1) and 60% compressed (Experiment 2). Effects of semantics and syntax were found for the RHD and normal groups under the normal rate of speech condition. Using compressed rates of speech, the effect of syntax disappeared, but the effect of semantics remained. Importantly, and contrary to expectations, the RHD group was similar to normals in continuing to demonstrate an effect of semantic context under conditions of increased processing demands. Results are discussed relative to contemporary theories of laterality, based on studies with normals, that suggest that the involvement of the left versus right hemisphere in context use may depend upon the type of contextual information being processed.

Article

Pell, M. D. (2001). Using prosody to resolve temporary syntactic ambiguities in speech production: acoustic data on brain-damaged speakers. Clinical Linguistics & Phonetics, 15(6), 441–456.

Left hemisphere brain lesions resulting in aphasia frequently produce impairments in speech production, including the ability to appropriately transmit linguistic distinctions through sentence prosody. The present investigation gathered preliminary data on how focal brain lesions influence one important aspect of prosody that has been largely ignored in the literature - the production of sentence-level syntactic distinctions that rely on prosodic alterations to disambiguate alternate meanings of a sentence. Utterances characterizing three distinct types of syntactic ambiguities (scope, prepositional phrase attachment, and noun phrase/sentential complement attachment) were elicited from individuals with unilateral left hemisphere damage (LHD), right hemisphere damage (RHD), and adults without brain pathology (NC). A written vignette preceding each ambiguous sentence target biased how the utterance was interpreted and produced. Recorded productions were analysed acoustically to examine parameters of duration (word length, pause) and fundamental frequency (F0) for key constituents specific to each of the ambiguity conditions. Results of the duration analyses demonstrated a preservation of many of the temporal cues to syntactic boundaries in both LHD and RHD patients. The two interpretations of sentences containing 'scope' and 'prepositional phrase attachment' ambiguities were differentiated by all speakers (including LHD and RHD patients) through the production of at least one critical temporal parameter that was consistent across the three groups. Temporal markers of sentences containing 'noun phrase/sentential complement attachment' ambiguities were not found to be encoded consistently within any speaker group and may be less amenable to experimental manipulation in this manner. Results of F0 analyses were far less revealing in characterizing different syntactic assignments of the stimuli, and coupled with other findings in the literature, may carry less weight than temporal parameters in this process. Together, results indicate that the ability to disambiguate sentences using prosodic variables is relatively spared subsequent to both LHD and RHD, although it is noteworthy that LHD patients did exhibit deficits regulating other temporal properties of the utterances, consistent with left hemisphere control of speech timing.

Article

— 2000 —

Pell, M. D. (2000). Intonation and emotion. The Journal of the Acoustical Society of America, 108(5_Supplement), 2533–2533.

Preliminary data were gathered on how simulated emotional characteristics of the voice influence the acoustic form of English utterances containing specific combinations of intonational and stress features [extending Eady and Cooper, J. Acoust. Soc. Am. 80, 402–415 (1986)]. Utterances varying in intrasentential focus position (initial, final, none) were elicited as both statements and questions in each of four emotional ‘‘modes’’ (neutral, sad, happy, angry) employing a structured elicitation procedure. Parameters of duration and fundamental frequency (f0) were then determined for the productions elicited from eight elderly speakers to specify important acoustic dimensions associated with specific combinations of stress, ‘‘modality,’’ and emotional features of the stimuli. Results of the acoustic analyses largely reaffirmed past accounts of how contrastive focus is encoded in (affectively neutral) statements and questions for English, and cohered well with the acoustic literature on how basic emotions are expressed vocally for three key acoustic parameters (mean f0, f0 range, speech rate). The impact of emotion on linguistic attributes of prosodic structure was most evident in the speakers’ modulation of f0, which was notably constrained in prosodic conditions where speakers were required to signal ‘‘marked’’ emotional and nonemotional intentions conjointly within the intonation contour. [Work funded by FCAR.]

Article

Cl, L., Baum, & Pell, M. D. (2000). Context use by right-hemisphere-damaged individuals under a compressed speech condition. PubMed, 43(1-3), 315–9.

The effect of increased processing demands on context use by RHD individuals was examined using a word-monitoring task. Subjects were required to monitor for a target word in sentences that were either normal, semantically anomalous, or both syntactically and semantically anomalous. Stimuli were presented at two rates of speech--normal and compressed to 70% of normal. Contrary to expectations, the RHD group performed similar to normals in demonstrating an effect of context at both rates of speech. Results are discussed relative to recent studies of normal brain functioning that suggest that the involvement of the LH versus the RH in context use depends upon the type of contextual information being processed.

— 1999 —

Pell, M. D. (1999). Fundamental Frequency Encoding of Linguistic and Emotional Prosody by Right Hemisphere-Damaged Speakers. Brain and Language, 69(2), 161–192.

To illuminate the nature of the right hemisphere's involvement in expressive prosodic functions, a story completion task was administered to matched groups of right hemisphere-damaged (RHD) and nonneurological control subjects. Utterances which simultaneously specified three prosodic distinctions (emphatic stress, sentence modality, emotional tone) were elicited from each subject group and then subjected to acoustic analysis to examine various fundamental frequency (F(0)) attributes of the stimuli. Results indicated that RHD speakers tended to produce F(0) patterns that resembled normal productions in overall shape, but with significantly less F(0) variation. The RHD patients were also less reliable than normal speakers at transmitting emphasis or emotional contrasts when judged from the listener's perspective. Examination of the results across a wide variety of stimulus types pointed to a deficit in successfully implementing continuous aspects of F(0) patterns following right hemisphere insult.

Article

Baum, S. R., & Pell, M. D. (1999). The neural bases of prosody: Insights from lesion studies and neuroimaging. Aphasiology, 13(8), 581–608.

This paper reviews the major findings and hypotheses to emerge in the literature concerned with speech prosody. Both production and perception of prosody are considered. Evidence from studies of patients with lateralized left or right hemisphere damage are presented, as well as relevant data from anatomical and functional imaging studies.

Article

Pell, M. D. (1999). The Temporal Organization of Affective and Non-Affective Speech in Patients with Right-Hemisphere Infarcts. Cortex, 35(4), 455–477.

To evaluate the right hemisphere's role in encoding speech prosody, an acoustic investigation of timing characteristics was undertaken in speakers with and without focal right-hemisphere damage (RHD) following cerebrovascular accident. Utterances varying along different prosodic dimensions (emphasis, emotion) were elicited from each speaker using a story completion paradigm, and measures of utterance rate and vowel duration were computed. Results demonstrated parallelism in how RHD and healthy individuals encoded the temporal correlates of emphasis in most experimental conditions. Differences in how RHD speakers employed temporal cues to specify some aspects of prosodic meaning (especially emotional content) were observed and corresponded to a reduction in the perceptibility of prosodic meanings when conveyed by the RHD speakers. Findings indicate that RHD individuals are most disturbed when expressing prosodic representations that vary in a graded (rather than categorical) manner in the speech signal (Blonder, Pickering, Heath et al., 1995; Pell, 1999a).

Article

— 1998 —

Pell, M. D. (1998). Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia, 36(8), 701–715.

The perception of prosodic distinctions by adults with unilateral right- (RHD) and left-hemisphere (LHD) damage and subjects without brain injury was assessed through six tasks that varied both functional (i.e. linguistic/emotional) and structural (i.e. acoustic) attributes of a common set of base stimuli. Three tasks explored the subjects' ability to perceive local prosodic markers associated with emphatic stress (Focus Perception condition) and three tasks examined the comprehension of emotional-prosodic meanings by the same listeners (Emotion Perception condition). Within each condition, an initial task measured the subjects' ability to recognize each "type" of prosody when all potential acoustic features (but no semantic features) signalled the target response (Baseline). Two additional tasks investigated the extent to which each group's performance on the Baseline task was influenced by duration (D-Neutral) or fundamental frequency (F-Neutral) parameters of the stimuli within each condition. Results revealed that both RHD and LHD patients were impaired, relative to healthy control subjects, in interpreting the emotional meaning of prosodic contours, but that only LHD patients displayed subnormal capacity to perceive linguistic (emphatic) specifications via prosodic cues. The performance of the RHD and LHD patients was also selectively disturbed when certain acoustic properties of the stimuli were manipulated, suggesting that both functional and structural attributes of prosodic patterns may be determinants of prosody lateralization.

Article

Pell, M. D. (1998). Influence of functional and acoustic parameters of intonation contours on prosody lateralization. The Journal of the Acoustical Society of America, 103(5_Supplement), 2890–2890.

The perception of prosody by adults with unilateral right- (RHD) and left-hemisphere (LHD) damage and subjects without brain injury was assessed through six tasks that varied functional (i.e., linguistic/emotional) and acoustic attributes of a common set of stimuli. Three tasks measured the ability to detect prominence cues in short utterances (focus perception) and three tasks examined the perception of emotional-prosodic features by the same listeners (emotion perception). Within each condition, an initial task tested recognition of each ‘‘type’’ of prosody when all naturally occurring acoustic features (but no semantic features) signalled the target response (baseline). Two additional tasks assessed how changes in (a) duration and (b) fundamental frequency parameters of the stimuli influenced performance on the baseline task within each condition. Results indicated that, irrespective of the availability of specific acoustic parameters, recognition of prosody by both RHD and LHD patients was highly influenced by functional attributes of the stimuli (i.e., LHD patients were selectively impaired in detecting linguistic prominence whereas both groups were impaired in deriving the emotional tone). Brain-damaged patients also displayed irregularities when individual acoustic properties of the stimuli were manipulated, indicating that both functional and acoustic attributes of prosody may be determinants of prosody lateralization.

Article

— 1997 —

Baum, S. R., Pell, M. D., Leonard, C. L., & Gordon, J. K. (1997). The Ability of Right- and Left-Hemisphere-Damaged Individuals to Produce and Interpret Prosodic Cues Marking Phrasal Boundaries. Language and Speech, 40(4), 313–330.

Two experiments were conducted with the purpose of investigating the ability of right- and left-hemisphere-damaged individuals to produce and perceive the acoustic correlates to phrase boundaries. In the production experiment, the utterance pink and black and green was elicited in three different conditions corresponding to different arrangements of colored squares. Acoustic analyses revealed that both left- and right-hemisphere-damaged patients exhibited fewer of the expected acoustic patterns in their productions than did normal control subjects. The reduction in acoustic cues to phrase boundaries in the utterances of both patient groups was perceptually salient to three trained listeners. The perception experiment demonstrated a significant impairment in the ability of both left-hemisphere-damaged and right-hemisphere-damaged individuals to perceive phrasal groupings. Results are discussed in relation to current hypotheses concerning the cerebral lateralization of speech prosody.

Article

Pell, M. D., & Baum, S. R. (1997). Unilateral Brain Damage, Prosodic Comprehension Deficits, and the Acoustic Cues to Prosody. Brain and Language, 57(2), 195–214.

Stimuli from two previously presented comprehension tasks of affective and linguistic prosody (Pell & Baum, 1997) were analyzed acoustically and subjected to several discriminant function analyses, following Van Lancker and Sidtis (1992). An analysis of the errors made on these tasks by left-hemisphere-damaged (LHD) and right-hemisphere-damaged (RHD) subjects examined whether each clinical group relied on specific (and potentially different) acoustic features in comprehending prosodic stimuli (Van Lancker & Sidtis, 1992). Analyses also indicated whether the brain-damaged patients tested in Pell and Baum (1997) exhibited perceptual impairments in the processing of intonation. Acoustic analyses of the utterances reaffirmed the importance of F0 cues in signaling affective and linguistic prosody. Analyses of subjects' affective misclassifications did not suggest that LHD and RHD patients were biased by different sets of the acoustic features to prosody in judging their meaning, in contrast to Van Lancker and Sidtis (1992). However, qualitative differences were noted in the ability of LHD and RHD patients to identify linguistic prosody, indicating that LHD subjects may be specifically impaired in decoding linguistically defined categorical features of prosodic patterns.

Article

Pell, M. D., & Baum, S. R. (1997). The Ability to Perceive and Comprehend Intonation in Linguistic and Affective Contexts by Brain-Damaged Adults. Brain and Language, 57(1), 80–99.

Receptive tasks of linguistic and affective prosody were administered to 9 right-hemisphere-damaged (RHD), 10 left-hemisphere-damaged (LHD), and 10 age-matched control (NC) subjects. Two tasks measured subjects' ability to discriminate utterances based solely on prosodic cues, and six tasks required subjects to identify linguistic or affective intonational meanings. Identification tasks manipulated the degree to which the auditory stimuli were structured linguistically, presenting speech-filtered, nonsensical, and semantically well-formed utterances in different tasks. Neither patient group was impaired relative to normals in discriminating prosodic patterns or recognizing affective tone conveyed suprasegmentally, suggesting that neither the LHD nor the RHD patients displayed a receptive disturbance for emotional prosody. The LHD group, however, was differentially impaired on linguistic rather than emotional tasks and performed significantly worse than the NC group on linguistic tasks even when semantic information biased the target response.

Article

Baum, S. R., & Pell, M. D. (1997). Production of affective and linguistic prosody by brain-damaged patients. Aphasiology, 11(2), 177–198.

To test a number of hypotheses concerning the functional lateralization of speech prosody, the ability of unilaterally right-hemisphere-damaged (RHD), unilaterally left-hemisphere-damaged (LHD), and age-matched control subjects (NC) to produce linguistic and affective prosodic contrasts at the sentence level was assessed via acoustic analysis. Multiple aspects of suprasegmental processing were explored, including a manipulation of the type of elicitation task employed (repetition vs reading) and the amount of linguistic structure provided in experimental stimuli (stimuli were either speech-filtered, nonsensical, or semantically well formed). In general, the results demonstrated that both RHD and LHD patients were able to appropriately utilize the acoustic parameters examined (duration, fundamental frequency (F 0), amplitude) to differentiate both linguistic and affective sentence types in a manner comparable to NC speakers. Some irregularities in the global modulation of F 0 and amplitude by RHD speakers were noted, however. Overall, the present findings do not provide support for previous claims that the right hemisphere is specifically engaged in the production of affective prosody. Alternative models of prosodic processing are noted.

Article

— 1996 —

Pell, M. D. (1996). On the Receptive Prosodic Loss in Parkinson's Disease. Cortex, 32(4), 693–704.

To comprehensively explore how the processing of linguistic and affective prosodic cues is affected by idiopathic Parkinson's disease (PD), a battery of receptive tests was presented to eleven PD patients without intellectual or language impairment and eleven control subjects (NC) matched for age, gender, and educational attainment. Receptive abilities for both low-level (discrimination) and higher-level (identification) prosodic processing were explored; moreover, the identification of prosodic feature was tested at both the lexical level (phonemic stress perception) and over the sentential domain (prosodic pattern identification). The results obtained demonstrated a general reduction in the ability of the PD patients to identify the linguistic- and affective-prosodic meaning of utterances relative to NC subjects, without a concurrent loss in the ability to perceive phonemic stress contrasts or discriminate prosodic patterns. However, the qualitative pattern of the PD and NC groups' performance across the various identification conditions tested was remarkably uniform, indicating that only quantitative differences in comprehension abilities may have characterized the two groups. It is hypothesized that the basal ganglia form part of a functional network dedicated to prosodic processing (Blonder et al., 1989) and that the processes required to map prosodic features onto their communicative representations at the sentence level are rendered less efficient by the degenerative course of PD.

Article

Prépublications

— 2026 —

Domínguez-Arriola, M. E., Lam, P. C. H., Pérez, A., & Pell, M. D. (2026). How do we align in good conversation? Investigating the link between interaction quality and multimodal interpersonal coordination. bioRxiv (Cold Spring Harbor Laboratory).

Conversations can feel effortlessly engaging or, conversely, difficult and unrewarding. Multiple factors contribute to the experienced quality and outcomes of a conversation, among them how interlocutors align with each other. The present study investigated speech-to-speech, brain-to-speech, and brain-to-brain coordination as markers of interpersonal alignment, examining their relationship with jointly perceived interaction quality and mutual affinity between conversational partners. Pairs of previously unacquainted participants (dyads) engaged in multiple short, free-form conversations on topics of varying interest while their vocal and neural activity were simultaneously recorded in a dual-EEG (“hyperscanning”) setup. We analyzed interlocutors’ prosodic adaptation, neural speech tracking, and neural coordination during each conversation. At the speech-to-speech level, our findings reveal that partners with more positive mutual impressions became more similar in their volume and voice quality over the course of the experiment session, reflecting greater prosodic convergence. At the brain-to-speech level, we found no reliable effect of interaction quality on neural tracking of unfolding speech within any individual region, although topographical differences suggested relative modulation across scalp sites. Finally, at the brain-to-brain level, our findings show that higher perceived interaction quality enhanced inter-brain relationships across frequency bands (alpha and theta) and temporal dependencies ( concurrent /near-instantaneous and recurrent /listener-lagging), with the strongest effects observed for concurrent alpha-band coupling. These findings suggest that distinct coordination processes are involved in how interlocutors experience an interaction and how they establish relational affinity, casting new light into the mechanisms that make a conversation worthwhile.

Article

Chen, W., Pell, M. D., & Jiang, X. (2026). Human brains implicitly and rapidly distinguish AI from human voices before decoding prosodic meaning. bioRxiv (Cold Spring Harbor Laboratory).

People encounter AI voices daily. Existing behavioral studies suggest listeners rely on prosodic cues such as intonation and expressiveness to detect audio deepfakes, reporting that AI voices sound prosodically less rich than human voices. To test whether prosodic processing drives deepfake discrimination in the neural time course of voice processing, we recorded electroencephalographic (EEG) data while participants listened to human and AI-generated speakers producing utterances in confident vs. doubtful prosody (tone of voice), with attention directed toward memorizing speaker names. We used voice cloning to control for speaker identity confounds between human and AI voices. Multivariate pattern analysis revealed that neural discrimination of human vs. AI voices emerged rapidly regardless of prosody (confident: 176 ms; doubtful: 134 ms), substantially preceding prosody discrimination (confident vs. doubtful within human voices: 2066 ms; within AI voices: 1366 ms). Acoustic analysis confirmed that prosodic distinctions became classifiable only at utterance offset (90% normalized duration), converging with neural evidence that prosody requires near-complete temporal integration. This temporal dissociation between rapid voice source discrimination and late-emerging prosody decoding suggests that prosody plays a smaller role in audio deepfake detection than listeners retrospectively report. Representational similarity analysis further revealed that spectral envelope features (mel-frequency cepstral coefficients; MFCC), rather than the visually salient high-frequency energy differences, drove neural human–AI discrimination, with MFCC’s earliest independent contribution (228 ms) closely following the MVPA decoding onset (134–176 ms). Future studies may manipulate specific acoustic components to establish the causal sources of this rapid and sustained neural discrimination. Significance Statement People encounter AI voices daily, in phone calls, navigation apps, supermarket checkouts, and subway announcements. Using electroencephalography, we show that the human brain automatically and rapidly distinguishes everyday AI voices from human speech, even without conscious attention to voice source. Although people may attribute this ability to AI voices sounding monotone or prosodically unnatural, the brain relies on subtler acoustic signatures, enabling discrimination before prosodic information becomes available. Attempts to identify the specific acoustic features driving this neural detection were inconclusive, pointing to the need for future causal investigations. We encourage engineers and policymakers to ensure AI voices remain perceptually detectable, as increasingly humanlike AI voices could cognitively disadvantage the general public if they become indistinguishable from human speech.

Article

Communications en conférence

— 2026 —

Arriola, M. E. D., Lam, P. C. H., Pérez, A., & Pell, M. D. (2026). Conversational engagement modulates neural speech tracking in real-time dialogue.

2026  ·  View paper

Melo, L. E. H., & Pell, M. D. (2026). Effects of loneliness on ERP responses to supportive speech.

2026  ·  View paper

Lam, P. C. H., Domínguez-Arriola, M. E., Pérez, A., & Pell, M. D. (2026). Conversational engagement predicts interpersonal neural synchrony.

2026  ·  View paper

— 2018 —

Mauchand, M., Vergis, N., & Pell, M. D. (2018). Ironic tones of voices.

Proc. Speech Prosody 2018  ·  2018  ·  View paper

— 2014 —

Jiang, X., & Pell, M. D. (2014). Encoding and decoding confidence information in speech.

Proc. Speech Prosody 2014  ·  2014  ·  View paper

Liu, P., & Pell, M. D. (2014). Processing emotional prosody in Mandarin Chinese: A cross-language comparison.

Proc. Speech Prosody 2014  ·  2014  ·  View paper

— 2011 —

Garrido‐Vásquez, P., Paulmann, S., Pell, M. D., & Kotz, S. A. (2011). Dynamisches cross-modales Priming von Emotion: Eine Studie mit ereigniskorrelierten Potentialen.

Beiträge zur 53. Tagung experimentell arbeitender Psychologen  ·  2011

— 2010 —

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., Strecker, K., Schwarz, J., & Kotz, S. A. (2010). Links oder rechts? Wie emotionale Prosodieverarbeitung bei Morbus Parkinson durch die Seitigkeit der motorischen Symptome beeinflusst wird.

KogWis 2010: 10. Tagung der Gesellschaft fuer Kognitionswissenschaft  ·  2010

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., Strecker, K., Schwarz, J., & Kotz, S. A. (2010). Emotional prosody processing in Parkinson's disease: Sidedness of motor symptoms makes the difference.

17th Annual Meeting of the Cognitive Neuroscience Society (CNS)  ·  2010

Garrido‐Vásquez, P., Pell, M. D., Paulmann, S., Strecker, K., Schwarz, J., & Kotz, S. A. (2010). Einflüsse des Arbeitsgedächtnisses bei der Verarbeitung emotionaler Prosodie in Parkinsonpatienten [Influences of working memory on emotional prosody processing in patients with Parkinson's Disease].

52nd Annual German Experimental Psychology Meeting (TeaP), Saarland University, Saarbrücken, Germany  ·  2010

— 2006 —

Paulmann, S., Pell, M. D., & Kotz, S. A. (2006). ERP-evidence on emotional prosody percpetion in BG-patients: Selective impairments for vocal expressions of disgust and fear?.

Max Planck Digital Library  ·  2006

Jeux de données et instruments

Outils de recherche et jeux de données citables — par ordre alphabétique.

Rothermich, K., & Pell, M. D. (2015). Relational Inference in Social Communication Inventory. PsycTESTS Dataset.

Back to top