[This project utilized R programming code for sentiment analysis. The R script and data are available upon request.]
Introduction
In February 2022, Russia officially initiated its invasion of Ukraine. This led to immediate international condemnation, particularly from Western countries. Western public opinion largely supported their governments in imposing sanctions on Russia and providing aid to the Ukrainian government and people. However, Russian public opinion, as presented by state media, appeared to be supportive of the invasion. Within Russia, the invasion was framed as a “special military operation.” Within non-Western countries, opinions were a bit mixed. Some were skeptical about NATO’s advancements near Russia and the West’s intentions on the global stage.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6703ff484d2836c2db22726e_664439a2f4e079d5c55e4a37_protests.jpeg)
Eventually, the prolonged nature of the conflict, which continues to this day, led to increasing war fatigue. Different populations’ views on the war changed as it began affecting their daily lives. This can be seen in some members of the Republican political party in the United States, who started calling for an end to support for Ukraine as they face increasingly expensive gasoline prices. Overall, the information warfare and media narratives have played a major role in shaping divergent public opinions on the conflict. Public opinion is currently polarized, and continues to shift with the evolving circumstances of the war and its broader geopolitical implications.
With all this in mind, I set out to measure people’s initial impressions of the Russian invasion of Ukraine. Their first impressions, uninfluenced by the implications of the war on their daily lives. Did people support the invasion? Were they outraged about Putin’s actions? This is all important as it showcases public opinion towards the war before outsiders spectating it were impacted by the invasion.
Data Collection and Intention
To collect this information, I found a Reddit post back from February 2022. The post asked users of the social media application, “how do you feel about what’s happening in Ukraine right now?” While the actual post is only a sentence, it was extremely effective in encouraging people to share their opinions as the post garnered over 5,000 comments. The post has since been archived, so I am confident that these comments are from that time period and not users visiting the post at a later time.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6703ff484d2836c2db227245_66443a3df4e079d5c55ea8f4_Screenshot%25202024-05-15%2520at%252012.29.29%25E2%2580%25AFAM.png)
After finding a seemingly perfect post that attracted a significant amount of users, I set out to analyze the comments. Reading each comment and taking notes about the sentiment would have been informative, but given that the post attracted such a large number of comments, it seemed incredibly time consuming. To resolve this conflict, I decided to employ advanced sentiment analysis tools. My first step in utilizing these tools was to collect the data. Luckily, Reddit provides an efficient API front that I was able to extract all the comments from the post with. All I needed to extract the comments was a library called PRAW. After extracting the data I processed and cleaned it up, preparing it for the text analysis tools. This consisted of tokenization, stemming, and edits to the data structure of the output received from the Reddit API.
When it comes to text analysis, I employed two python libraries. The first is Textblob, which returns polarity and subjectivity scores. Polarity scores in Textblob range from -1 to +1, and indicate how negative or positive the sentiment of a sentence is. Subjectivity scores in Textblob range from 0 to 1, with 0 indicating that a text is extremely objective, and 1 indicating that a text is extremely subjective. The second library utilized is VADER. Vader returns a variety of scores related to polarity. It returns positive, negative, neutral, and compound scores for each text it analyzes. The positive, negative, and neutral scores indicate to what degree a text is of each of those attributes. While the compound score summarizes all that information on a scale from -1 to +1, with -1 indicating an extremely negative sentiment and +1 indicating an extremely positive sentiment.
In this project, I decided to include all the scores generated from each library into my dataset. I aimed to gather an abundant amount of data on the comments’ sentiments to give myself ample amount of information to generate descriptive statistics on. For polarity scores, from both VADER and Textblob, I created additional columns that indicate with words whether a text’s sentiment is negative, neutral, or positive. I found these were helpful additional variables to have as they allowed me to perform analysis that utilizes categorical rather than numerical data.
Sentiment Analysis
From analyzing my comments data using the Python Textblob library, I found that a great amount of comments were objective and neutral. As displayed in my Textblob Sentiment Distribution figure (Figure 1), around half the data was neutral with the rest divided on both positive and negative sentiment.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6644384cce09acd75b117bc1_L0LiC3jrmjRtIU2Ag7rxH5wx6WH5cRKlFxND-91-Qm1MwHIMteXVP88RDgF2EvdFc0tit66qCaKLjD_Ws2xCc1_5agMelGTP_EB19YSBFiJSA_4NGqqyOSiwtw-1kazt4k7nsWOK7on2mxSjANg_F_Q.png)
Additionally, when looking at the Textblob Subjectivity Distribution figure (Figure 2), more than 25% of the comments were rated extremely objective. Overall, the subjectivity distribution shows that there were more comments rated objective than subjective (with varying degrees of both).
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6644384e4443260c166733d7_UEX4EZhbhJTqgL9nQZXI9qOMrFcx0PfcwwDskjZrgzQ-OvVxccxkgd2L1ITlx6NIwWcva9dO_sTWfLJj8ioVwkRHiIJyGO3TmUgIx-RgumNvbgD69JZUQwtK5_mMFqrzzs_a23oLxMEmAcUcY2YA7ZU.png)
When looking at the results of my sentiment analysis using the Python VADER library, there is a different story to be told. According to the VADER Sentiment Distribution figure (Figure 3), there were less neutral comments and more comments with extreme sentiment values –either extremely negative or extremely positive. This demonstrates that VADER is more likely to assign extreme values to comments than Textblob. Textblob tends to be reserved in its sentiment grading and not assign many extreme values.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6644384c758f7782d4b01d96_0bx6griaQUJOsK50MCJQIqALUR6VK3vN6lsmOA5Vl-Is4gfQemQHC1D3L94eEQS64sjApN4twPI2dBTGVs8pcEHBQXYjytl92Ea2N8XsnltU8KqPGcrjEh5lQl_BFQXoMEX4du2eshbvk5yM7nThimw.png)
To gain a better understanding of where the majority of comments lied on subjectivity and sentiment, I generated some summary statistics of the variables generated by the sentiment analysis tools (Figure 4). Overall, the polarity score generated by Textblob is positively skewed with a median of 0, the subjectivity score of Textblob is negatively skewed with a median of 0.4, and the compound score generated by VADER is positively skewed with a median of 0. Both subjectivity and polarity from Textblob had standard deviations between .25 and .3, while VADER’s compound score had a standard deviation of .510 suggesting it is more spread out. Furthermore, I found that objective comments are far more likely to be neutral than subjective comments.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6703ff484d2836c2db227248_66443b8bd213d7393ddebc67_figure%25204.png)
To further understand the content of the comments on the social media post I selected for this report, I generated a few word clouds. The first word cloud, which analyzed all of the comments, included the expected words like “Russia”, “Ukraine”, and “Putin.” One of the most interesting words that appeared in the word cloud is “NATO”. This is interesting as it will later (after the Reddit post’s creation date) become increasingly important as pro-Russia people justify Putin’s actions as a response to encroachment by NATO. In my second world cloud, which analyzed only the contents of negative sentiment comments, I noticed that the word “Putin” had gotten significantly bigger suggesting that negative comments tend to use the word more than others. Additionally the second word cloud understandably contained words like “wrong” and “propaganda”. Lastly, my third word cloud focusing on positive sentiment comments highlighted the word “border” ––among other words–– which might have been used in comments to justify the invasion.
Given that all the word clouds contained the word “Putin,” I wanted to zoom in a bit and gain a fuller understanding of the context the word was used in. To accomplish this, I filtered all the comments based on whether they included the word “Putin” or not, and then generated summary statistics of how many of those comments were positive, neutral, and negative. My findings (Figure 5) display that comments with the word “Putin” tend to be more negative than comments without the word “Putin.” Additionally, comments that did mention Putin are far less likely to be neutral (9.62% neutral compared to 25.19%). In this analysis I based my sentiment scores based on the compound scores generated by the Python VADER library.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6703ff484d2836c2db227236_66443bdb87a2cb7ef50d14de_figure%25205.png)
Predictive Analysis
Having finished my original plan of analyzing the sentiment of comments regarding the Russian Ukrainian war immediately after its outbreak, I decided to further explore the topic by building a machine learning model. In my model I tried to predict whether a comment is positive or negative sentiment (VADER compound score) based on its upvotes (similar to the count of likes on other social media platforms), word count, and subjectivity score (generated using Textblob). For context, I generated a scatter plot of VADER sentiment scores to Word Count (Figure 6).
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6644384c64a7df4b863b0445_7mVkXaVfVaWgL55_qqCcBUVnHiJMnruNKnpNEOSqSAoHHQqLAaB0l3_FlNsDJa1rplimH-hKboyVq4Z4-bKMazSpYgVH4FZKcuLFCTRrD2zbCQzRh9I2_N-Vi4pr6srj1iO6kM5l92dDUXsHzaUXV0E.png)
My findings were that machine learning struggles to predict whether a comment is negative or positive sentiment based on these features and on the topic of the Russia-Ukraine war. As seen in the Confusion Matrix Heat Map (Figure 7), the model is more likely to predict that a comment is positive (a score of 1) than negative (a score of 0). It incorrectly predicted 296 comments as positive while they were negative, and 75 comments as negative while they were positive. The model’s accuracy score was 52%, meaning that it correctly predicted the outcome 52% of the time. Its precision score was 52.64%, meaning when the model predicts a positive outcome, it is correct about 53% of the time. Its recall score was 81.44%, meaning that the model correctly identifies 81% of all actual positive outcomes. Lastly, its F1-score was 63.95%, meaning it has a moderate balance between precision and recall, leaning more towards recall.
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6703ff484d2836c2db227239_66443c5e07df2b66f8a6a2a9_Figure%25207.png)
Overall, the machine learning model seemed to be better at identifying the positive class but it is less accurate when it comes to the precision of those positive predictions. The accuracy is not particularly high, indicating that the model might struggle to correctly label instances overall. The F1-score provides a more balanced view of the model’s performance but still suggests moderate performance. I would not consider this machine learning model effective. This makes sense as a comment’s upvote count and word count do not provide the full picture.
Conclusion
Running a sentiment analysis on reddit comments regarding the Russian invasion of Ukraine taught me that sentiment analysis might not be as easy to study as I originally thought. Going into this report I was expecting to see an overwhelming majority of negative sentiment given that Western media emphasizes that the war is not ethical or moral. What I saw in return was a seemingly even split of negative and positive sentiment. This led me to go back and inspect the comments manually and see why this was the case. What I learned was that sentiment analysis did not account for what the subject of each comment was. A comment that spoke badly of Russia would be a negative sentiment, the same way a comment that spoke terrible about the Ukrainian people would be a negative sentiment comment. Conversely, a comment praising Putin and a comment praising the resilience of Ukrainians would both have a positive sentiment score. This leads me to think that this social problem would benefit from a more controlled study where the subject of each comment is accounted for. An example would be a more specific question that asks something along the lines of how do you feel about the Russian government?
![](https://cdn.prod.website-files.com/6644282a51425c9bf81f1658/6644384c9e970cbb0dc49a2c_z74dcIOYNGwIsizkXfpBCf1QFHZkvj7k--MW4-_j2sGeyQP4HETBXiCgzFy4LBCsZJ93iaSZsvqvwxlFlWVJsyzq0C7vZVQndwAqT-C3_kBb1PD4ppaHv8tUzK-BTi32MHWdC4FF8m38Nfe-gLBMv7E.png)