top of page
brandan chen
NLP Sentiment and Emotion Analysis of Steam Horror Game Reviews
Project Overview
Understanding user feedback is essential for improving game design and user experience. For this project, I aimed to explore how different models interpret sentiments and emotions in user-generated content and compare those insights with actual user sentiments. The analysis focused on reviews for four popular horror games on Steam: Phasmophobia, Dead by Daylight, Lethal Company, and The Outlast Trials.
Sentiment Analysis
In this project, I conducted a comparative sentiment analysis on user reviews for popular Steam horror games. Using BERT-based models, I achieved an 80% accuracy when comparing predicted sentiments to actual user feedback on Steam.
The first graph shows the true sentiments, Positive or Negative, that the user labeled in their Steam review.
The second graph represents results using the bert-base-uncased-finetuned-sst2-v2 for binary sentiment classification.
The third graph represents results using the twitter-roberta-base-sentiment-latest to explore a multi-label classification, including Positive, Negative, and Neutral sentiments.
I identified bert-base-uncased-finetuned-sst2-v2 as having a higher consistency in capturing user sentiment, especially for binary positive/negative reviews. The twitter-roberta-base-sentiment-latest performed well but included neutral classifications that obviously reduced accuracy when compared to actual user reviews which only had Positive or Negative labels.
Emotion Analysis
For additional insight, I used the emotion-english-distilroberta-base to analyze and classify emotions within each Steam review. This model predicts Ekman's 6 basic emotions, plus a neutral class:
From the graph above, we can see that reviews for Lethal Company had the highest percentage of reviews labeled with 'Joy' (54.4%), measuring up with its 96% overall positive rating as the most popular game among its peers.
Although reviewers had the most enjoyment playing Lethal Company, it was actually Phasmophobia that produced the most fear, with 32% of its reviews classified for fear. This makes sense as Lethal Company has many comedic elements tied into gameplay, while Phasmophobia (a ghost-hunting game) is almost purely focused on scaring the player through their interactions with ghosts.
On the flip side, Dead by Daylight seemed to be the least scary out of the four games (13% of reviews). Moreover, it also had the highest percentage of reviews classified with 'Sadness' (10.6%), 'Disgust' (11.6%), and 'Anger' (9.2%). Recent Steam reviews for the game are more mixed compared to other games analyzed in this project. Note that Dead by Daylight is the only competitive multiplayer 'horror' game among the four, which could explain the more negative emotions in its Steam reviews. Players have indicated a love/hate relationship with the game due to a toxic community, poor balancing, and repetitiveness.
Fine Tuning
As part of the process, I attempted fine-tuning using custom-labeled reviews, aiming to enhance the model’s ability to capture the unique tones and language used in horror game reviews. I collected 50 reviews from various other horror games of a similar genre to the four analyzed in this project. Then, I manually labeled each review with one of four labels:
1. Generally Positive
2. Generally Negative
3. Immersive
4. Repetitive
I then used a Trainer from Hugging Face and split my CSV into training and evaluation datasets. I realized that there was some overlap between the Generally Positive and Immersive labels, as well as the Generally Negative and Repetitive labels. Unfortunately, due to a lack of training data, finetuning didn't work out as expected. Nonetheless, this was great practice as a first attempt at labeling data and finetuning!
This step taught me about the vast data requirements of fine-tuning, and how to avoid overfitting. In the future, I would want to choose categories that are more different than each other, and supply the model with a a lot more training data. Ultimately, I decided to use pre-trained models directly and compare their performance, as fine-tuning did not yield significant accuracy gains with the available data.
Topic Modeling
The project also included a detailed topic modeling approach to extract underlying themes from the reviews. I used scikit-learn's LDA model for topic modeling, identifying 10 topics per game, and 10 top words per topic. I improved the topic modeling process by modifying stop words, increasing the number of iterations, and more. This analysis offered insights into what users enjoyed and disliked in each game, revealing areas for potential improvement and aligning with player expectations for horror games.
Code
The code for this project was developed in partnership with the one and only ChatGPT!
bottom of page