Multiparty Personality Recognition requires machines to determine the main speaker's personality from a short conversation in binary Big Five personality traits:
- Agreeableness (AGR): trustworthy, straightforward, generous vs. unreliable, complicated, meager, and boastful
- Conscientiousness (CON): efficient and organized vs. sloppy and careless
- Extroversion (EXT): outgoing, talkative, and energetic vs. reserved and solitary
- Openness (OPN): inventive and curious vs. dogmatic and cautious
- Neuroticism (NEU): sensitive and nervous vs. secure and confident
This is a part of the Character Mining project led by the Emory NLP research group.
To generate the dataset, 711 short conversations are extracted and annotated from the first four seasons of Friends TV Show transcripts. There are 6405 unique tokens after pre-processing. In each short conversation, we ask three annotators to evaluate a main speaker's personality traits on a scale of -1, 0, 1. This task is challenging because only text information is given and might not contain enough information to fully understand the social context.
The following shows a multiparty dialogue between Monica and Paul.
s01_e01_c05(0) for Paul
Monica Geller: Oh my God!
Paul: I know, I know, I'm such an idiot. I guess I should have caught on when she started going to the dentist four and five times a week. I mean, how clean can teeth get?
Monica Geller: My brother's going through that right now, he's such a mess. How did you get through it?
Paul: Well, you might try accidentally breaking something valuable of hers, say her-
Monica Geller: -leg?
Agreeable: | 1 | 0 | -1 |
---|---|---|---|
Conscientious: | 1 | 0 | -1 |
Extraverted: | 1 | 0 | -1 |
Open to experience: | 1 | 0 | -1 |
Emotionally Stable: | 1 | 0 | -1 |
The scores of 3 annotators are summed up and converted to binary labels using the median split. This conversation has the following fields:
scene_id
: s01_e01_c05character
: PaulAGR
: 1CON
: 0EXT
: 1OPN
: 1NEU
: 0text
:
<b>s01_e01_c05(0) for Paul</b><br><br>
<b>Monica Geller</b>: Oh my God!<br><br>
<b>Paul</b>: I know, I know, I'm such an idiot. I guess I should have caught on when she started going to the dentist four and five times a week. I mean, how clean can teeth get?<br><br>
<b>Monica Geller</b>: My brother's going through that right now, he's such a mess. How did you get through it?<br><br>
<b>Paul</b>: Well, you might try accidentally breaking something valuable of hers, say her-<br><br>
<b>Monica Geller</b>: -leg?<br><br>
Note: The text
column consists conversation text strings in simple HTML format.
Each text starts with its scene id (e.g. s01_e01_c05) and the main speaker (e.g. Paul).
Each utterance is separated by <br><br>
and each speaker is highlighted with <br></br>
.
- Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings. Hang Jiang, Xianzhe Zhang, and Jinho D. Choi. (pdf; poster; slide)