Comparative Sentiment Analysis of News Articles using AWS Comprehend

Sage Inkwell
6 min readDec 19, 2023

--

Introduction:

In an era where news shapes public opinion, and everyone has instant access to information through the internet, analysing the sentiment of news articles is increasingly important. This blog post explores the use of AWS Comprehend for sentiment analysis on two news websites (CNN and The Guardian), showcasing how advanced natural language processing can uncover the underlying tone of a news article, in this context, regarding the military coup that took place in Myanmar on February 1, 2021.

With the coup now having lasted nearly three years, how do two international news outlets report on the situation, particularly focusing on the military’s ability to maintain power and the public’s chances of reclaiming their home country? More importantly, how does AWS AI analyze these articles?

Part 1: Setting Up the Environment

Like with any kind of world-building*, we have to set up our environment before we can delve into our journey — with sentiment analysis this begins by setting up our Python environment. This involves importing essential Python libraries: requests, BeautifulSoup, and boto3. Each library plays a vital role: requests bring in the data, BeautifulSoup helps us navigate through the complexities of HTML, and boto3 is our gateway to the AWS cloud.

Part 2: Scraping News Articles

I selected articles from two major news outlets, CNN and The Guardian, to compare their coverage on similar topics. Using requests and BeautifulSoup, I scraped textual content from these sites, highlighting the nuances of web scraping and the importance of choosing diverse news sources for a balanced analysis.

Part 3: Preparing Data for AWS Comprehend

AWS Comprehend has a processing limit of around 4500 characters at a time. To accommodate this, I split the text into smaller segments. This step underscores the need to be mindful of the limitations and capabilities of AI services.

Part 4: Sentiment Analysis Using AWS Comprehend

Here’s where AWS Comprehend comes into play. By sending the text segments to Comprehend, we get a detailed sentiment analysis, categorizing content into positive, negative, neutral, or mixed feelings. This part of the process requires careful setup of AWS credentials and an understanding of how to interpret the results produced by the service.

Utilising AWS Comprehend for sentiment analysis involves several critical steps:

  1. Configuration of AWS Credentials: Before initiating text analysis, it’s imperative to properly configure AWS credentials. This involves setting up an IAM (Identity and Access Management) role with the appropriate permissions to access Comprehend services.
  2. Text Preprocessing: Prior to analysis, text segments should be preprocessed for optimal results. This includes cleaning the text, removing unnecessary characters and images, and possibly segmenting larger texts into manageable parts, as AWS Comprehend has limitations on the size of the text it can process in a single request. Done in Part 3.
  3. API Integration: To use AWS Comprehend, integrate its API into your application. This involves crafting API requests that include the text to be analysed and sending these requests to the AWS Comprehend service.
  4. Interpreting Results: AWS Comprehend returns a JSON object containing the sentiment analysis results. This object includes not only the categorisation of sentiments but also a confidence score for each category. Understanding these scores is crucial for accurately interpreting the sentiment of the text.

By leveraging AWS Comprehend, you can uncover deep insights into the sentiment of text data, which is invaluable for applications ranging from customer feedback analysis to social media monitoring. However, it’s important to remember that machine learning models, including those used by AWS Comprehend, may not be perfect and should always be used in conjunction with human oversight for best results.

Part 5: Results and Observations

The sentiment analysis yielded intriguing contrasts between the two news sources. It was fascinating to see how different outlets presented the same news with varying tones. This section will delve into these differences, providing a side-by-side comparison of the sentiment in articles from both sources.

CNN — Sentiment Analysis Results

As mentioned in Part 3, since CNN’s article has around 13,000 characters, I had to split up the text into three parts of 4500 characters. Then I took the average of the three results. (shown in Table 3)

Table 1

The Guardian — Sentiment Analysis Result

Table 2

Comparative Sentiment Analysis Results

Table 3

Conclusion:

This project has not only demonstrated the robust capabilities of AWS Comprehend in performing sentiment analysis but has also shone a light on the nuanced ways in which different news sources report on similar topics. The analysis revealed significant disparities in the sentiment profiles of articles from CNN and The Guardian, underscoring the subjectivity inherent in news reporting.

Through the application of AWS Comprehend, we were able to quantitatively dissect the emotional tone of news content, providing a more objective lens through which to view media biases.

Furthermore, the limitations encountered, such as the AWS Comprehend’s character limit, remind us of the ongoing challenges in the field of natural language processing and sentiment analysis. These challenges present opportunities for further research and development, particularly in enhancing the accuracy and efficiency of sentiment analysis tools.

While AWS Comprehend effectively quantified the emotional tone of news content, it also became apparent that the tool struggles with the subtleties of human language. In particular, instances where ‘negative’ phrases were used in a positive or ironic context often misled the AI, highlighting a significant area for improvement in its understanding of nuanced language and context.

Having read through and manually analysed the two articles, I can say that the AI, while doing a great job for the most part, was not able to provide a completely accurate analysis of the Guardian’s and CNN’s takes. There are two possible conclusions we can draw from this result: either AI still lacks the ability to provide accurate analysis of human opinions, or further knowledge, experience, and skill with the tool is necessary to further refine the instructions in the code for the AI to provide satisfactory results.

In conclusion, my exploration into sentiment analysis using AWS Comprehend reinforces the importance of critically engaging with news content. And as technology continues to evolve, tools like AWS Comprehend will become increasingly vital in our quest to navigate and understand the vast ocean of digital information.

Additional Notes:

For those interested in exploring sentiment analysis further, I recommend experimenting with different news sources and types of articles.

Links To The Articles:

Footnotes:

world-building*: a term often used in the context of fiction writing, game design, and film production. It refers to the process of creating a detailed and believable fictional world, complete with its own geography, history, cultures, societies, and rules. This can include designing maps, developing political systems, creating languages, and establishing the social norms and customs of the world’s inhabitants. World-building is crucial in genres like fantasy and science fiction, where the story often takes place in a world that is significantly different from our own.

The term “world-building” was used metaphorically to describe the process of setting up a programming environment for a specific task, in this case, sentiment analysis. Just as authors carefully construct a fictional world for their narratives, programmers must carefully set up their working environment, choosing and configuring the right tools (in this case, Python libraries) to effectively carry out their tasks. This setup is essential for ensuring that the subsequent work, such as data analysis or application development, can be done efficiently and effectively.

--

--

No responses yet