PAPER: News stories’ heroes and villains can be detected automatically

March 14, 2018

Tags: methodology, online journalism, representation

The way different actors are framed in news stories is an important part of news literacy, a team of Northwestern University researchers posits. Analysing each story carefully enough to detect these frames, however, is a tall order for the average reader. In order to assist critical reading, Diego Gomez-Zara, Miriam Boon and Larry Birnbaum have been developing an automated role detection software, that could be easily installed as a browser plug-in.

The software works by first detecting the relevant entities in a news story. This is done by taking into account all entities present in the headline, and three entities from the main text, which have the highest relevance scores. These scores are calculated according to how often the entity is mentioned and how early on in the text does the first mention occur.

The detected entities’ roles are determined based on three “dictionaries” – lists of words that are commonly associated with “heroes”, “villains” or “victims”. Each of these lists is approximately 200 words long and hand-picked by the researchers.

The actual analysis combines sentiment and similarity analysis. Words that are determined as negative, are compared against the “villain” dictionary, while positive and neutral words are compared against both “hero” and “victim” dictionaries. The model also takes into account the proximity of words to the entity in question: words that are very close are given more weight than words that are separated by many other words.

The researchers tested the software on real-life news stories with promising results. They present an example of two news articles regarding president Donald Trump’s visit to Paris, published by The New York Times and Fox News, respectively. The software detected president Trump as the “villain” of the NYT story, and the “hero” of the Fox News story.

The program is still in development, but the researchers say they are planning to release it to the general public soon. The team, however, emphasizes that the program’s accuracy should be tested against human assessments, which so far has not been done.

The paper “Who is the Hero, the Villain, and the Victim?” was presented at the 23^rd International Conference on Intelligent User Interfaces. It is available online on Association for Computing Machinery’s digital library (open access).

Picture: Untitled by Marc Mueller, licence CC0 1.0.

Related posts

Reading, writing, rumour: press readership and the making of war knowledge in Australia 1914–1918

Capturing the fourth estate: A case study of Bangladesh news media

Propaganda in Pre-Soviet Caucasian Press at The Example of Comparative Analysis of Georgian Newspapers: ‘Ertoba’ (1919) and ‘Sakartvelo’ (1919)

“Dying in harness:” How news workers’ obituaries in the 20th century served as meta-journalistic discourse about trauma and coping

Privacy before Campbell

Research of May 2026

The effectiveness of investigative journalism during the 2022 mass protests in Sri Lanka: A quantitative study

Pushing and Pulling: How News Organizations Frame Push Notifications to Capture Audience Attention

Journalists’ Perceptions of Ownership Change and its Impact on Journalistic Content and Work