Predicting Valence and Arousal from Affective Images: A Comparative Analysis of deep learning and Random Forest Regressor

Priyadarshani, MuskanMuskanPriyadarshaniMiyapuram, Krishna P.Krishna P.Miyapuram2026-01-122026-01-122025-06-25[9798400711244]10.1145/3703323.37037142-s2.0-105012244623https://repository.iitgn.ac.in/handle/IITG2025/33838Emotion recognition from visual stimuli has emerged as a crucial area of research with wide applications in the field of Human-Computer Interaction (HCI) and mental health monitoring. Understanding and predicting emotional responses to visual stimuli from images is a critical task in affective computing. Our study uses deep learning and classical machine learning techniques to classify emotions based on color images. The OASIS image dataset was used; it contains multiple themes of images, including objects, scenes, persons, and animals, with their respective arousal and valence ratings. We applied k-means clustering to identify the number of data points in the maximum cluster within those ratings. We used a Convolutional Neural Network (CNN) regressor for feature extraction of images with their ratings and separately evaluated the error metrics of both the CNN and Random Forest regressor. The results imply that the CNN regression model performs better when predicting emotional dimensions than the Random Forest regression model. This model achieves lower MAE, MSE, and RMSE across the metrics. It shows a more precise and reliable performance in capturing the complexity of emotional dimensions.trueArousal | Deep Learning | Emotion | Random Forest | Regression | Valence | Visual StimuliPredicting Valence and Arousal from Affective Images: A Comparative Analysis of deep learning and Random Forest RegressorConference Paperhttps://dl.acm.org/doi/pdf/10.1145/3703323.3703714350-35225 June 20250cpConference Paper