How did the attitude of #ReopenAmerica change over time?
“Reopening America” has been a recurring topic in the news since March 2020. As a consequence of the COVID-19 pandemic, businesses across the US were shut down to abide by government and health protocols, consequently stagnating the national economy. Over the past ten months, businesses have experienced closures that resulted in layoffs and financial losses, partial reopenings, then additional closures following that. The discussion of these topics are significant today because it is still uncertain when everything will go back to “normal”. Although these policies have saved numerous lives by preventing the spread of the virus, they have also contributed to many individuals dealing with mental and financial suffering. For these reasons and its general widespread pervasiveness, reopening America has become a heated topic on social media.
Many people have been taking to Twitter to voice their opinions on this subject matter. Our project aims to inform readers and shed light on the dynamic viewpoints of Twitter users on the topic of reopening America, through temporal and spatial perspectives. Specifically, we are interested in seeing what people generally think about the reopening of America over time, as well as in different geographical regions of the US. We examined tweets including the #ReopenAmerica hashtag from March 17, 2020 to January 24, 2021. We primarily collected data about user information, geographical information, specific time of each tweet, and actual text content of each tweet.
In this project, we aimed to answer the following research questions:
1. How have aggregated tweet attitudes towards #ReopenAmerica changed over time?
2. What were the most common words, topics, and themes in the tweets?
3. How does the geographical distribution of tweets correlate with average tweet sentiment?
4. How has the activity of tweets including the #ReopenAmerica hashtag spiked or dipped in response to current events (eg. news) within the timeframe of the dataset (03/17/2020 - 01/24/2021)?
Our intended audience is the public along with social media companies. Our interactive website allows our audience to an inform themselves about trends and observations relating to tweets containing the #ReopenAmerica hashtag that our team has identified. Our public audience will also be able to learn more about the average tweet sentiment and values individuals might share depending on their geographical location in the US. In addition, we hope that social media companies can take note of information we present and make decisions to help curb the spread of misinformation that might be associated with this hashtag. We address the needs of our audience by providing them with geographical trends and visualizations from this data, which serves an opportunity to understand a novel aspect of the pandemic at a national scale and increase the space for communication. The audience will probably ask questions such as: “Does each state represent the entire national conversation on the closure of many businesses?”
Our team utilized several softwares and tools to build our project, most importantly being Python, Tableau, and Wix. With Python, we were able to create a beautiful word cloud of the most common words appearing in our dataset’s tweets, scrape additional data from Twitter’s API using the Tweepy / GetOldTweets3 libraries, and use the Vader library to conduct sentiment analysis on the tweets. This sentiment analysis was ultimately used to create several visualizations and charts in Tableau that demonstrate aggregated tweet polarity across temporal and spatial dimensions. We choose to use Tableau to develop our visualizations, as the platform’s illustrations and dashboard allow for audiences to have an interactive visual experience. To present all of our information and findings in the project, we choose to use Wix, a popular platform for creating simple websites. Wix allows us to make the presentation of information interactive, as we are ultimately trying to tell a story to our audience.
We chose to use two datasets in our research and analysis, and both datasets consist of tweets including the #ReopenAmerica hashtag. Our first, smaller dataset includes data from March 17, 2020 to March 24, 2020, and has 18 columns including user information, geographical information, specific time of each tweet, and actual content of each tweet. There are 5999 tweets in total in this dataset and 1707 distinct tweets in terms of content. Our second dataset is essentially an extended version of the first dataset. The only difference is that it is larger and includes data from March 18, 2020 to January 24, 2021. As a result, there are 31158 tweets in total and 30055 distinct tweets in terms of content.
Our second, larger dataset was scraped through the use of Twitter’s API, and Python libraries including GetOldTweets3 and Tweepy were utilized to extract tweets from farther back in time than the seven days the API regularly allows. Since all of our data we obtained was initially unprocessed, our team had to go in and clean several columns of the dataset in order to carry out our planned analyses and create corresponding visualizations. First, since we wanted to create geography-based graphs, we needed to clean the dataset’s location column. We chose to conduct this cleaning solely on the first dataset, as it was smaller in size and could provide insight into general sentiments towards the beginning of the #ReopenAmerica movement. Computational approaches would not have been very successful in deriving actual state and country data from the scraped location column, as Twitter users can input (literally) anything in the location field of their profiles. We observed entries such as “young and restless midwest” and “Coast 2 Coast Chicago Most :)”, and unfortunately there is no open source machine learning algorithm that would be able to interpret these entries as originating from the US and Chicago, IL respectively. Therefore, we had to manually clean all 5,999 entries of this dataset’s location column, and derive state and country data wherever possible. We were ultimately able to discern that approximately 49.8% of total tweets originated from a particular state in the US, and that approximately 59.2% of total tweets originated from a particular country. Furthermore, we found that 96.9% of tweets originating from an identified country originated in the US, but the remaining 3.1% were tweeted from accounts all around the world!
Our team also wanted to utilize computational tools to infer sentiment from our scraped collection of tweets. We tried using both the TextBlob and VADER python libraries to infer polarity scores and resulting sentiment from each of the tweets, and used Python and Voyant to create several word cloud visualizations to understand the most popular words or phrases embedded in these tweets. Between Textblob and VADER, we found a few sources online indicating that VADER is a better tool for understanding social media sentiment, and so we decided to incorporate our VADER results for all visualizations that required sentiment-related data. Our team ultimately utilized Tableau to create most of our visualizations, as it served as a great tool to create interactive and visually appealing diagrams that could effectively communicate our analyses to our audience.
This word cloud, developed with Python, highlights the most popular words from the Twitter corpus, where the relative sizes of words or phrases represent the number of times it appeared in our datasets. Initially, the phrase “ReopenAmerica” dominated the word cloud. We decided to omit it from this visualization, as we wanted to see other keywords that were hidden in the data. Once “ReopenAmerica” was removed, we found that the next most common words included “COVID19”, “now”, “May”, and “Trump”. As the subject matter suggests, many users engaging in this topic likely wanted to see changes in terms of opening the economy immediately, despite the health concerns of the virus. Additionally, many of the tweets were directed at Trump, as he was the President at the time of this incident. Another significant term in the word cloud is “May”. As we will see in the next visualization, this topic was particularly active around the month of May, as many local governments had initially decided at the start of the pandemic to evaluate effects of stay-at-home orders around this time.
This dashboard visualizes all 31,158 tweets in our dataset, from March 17, 2020 to January 24th, 2021. As shown on the right hand side, we can observe that most tweets were created between March to May 2020, at the initial stages of the economic shut down. Selecting the slivers for each month on the left filters the view to see a specific month’s top tweets. As expected, we see a lot of the top tweets include #ReopenAmerica. Also, we observe that many of these tweets are specifically directed towards Donald Trump.
Here, we see a geographical representation of aggregated tweet sentiment surrounding the national lockdown from March 17th through March 24th of 2020 (first dataset). After cleaning the geographical aspect of the data as much as we could, we were able to discern that approximately half (49.8%) of total tweets originated from a particular state in the US, which we then mapped to the corresponding states in Tableau. The relative coloring on the map illustrates average tweet sentiment, which we chose to illustrate through the use of compounded polarity from our VADER sentiment analysis. A darker shade of orange on a state indicates that tweets from that state had relatively more negative sentiment on average. We clearly see that average tweet sentiments across all states in the US were either neutral (polarity score is approximately 0) or negative (polarity is negative), as there were no states where the average tweet sentiment was positive (polarity is positive). We also see that users from some states felt more strongly negative than others in their tweets containing #ReopenAmerica, and three states with the most negative average tweet polarity were West Virginia, Kansas, and Connecticut. The general trend of negativity we see here makes sense, as most users would choose to utilize public platforms like Twitter to express dissatisfaction or frustration over the national lockdown or the #ReopenAmerica hashtag. There may also be some bias inherent to the VADER sentiment analysis tool as well, which we do not have control over. We chose to display the average Tweet polarity rather than the pure frequencies of positively and negatively classified tweets for each state because the dataset was not uniformly distributed in terms of how many tweets we observed for each state. For example, we observed 482 tweets from California while only 2 from Utah, indicating that aggregated measures would be a better way to interpret data at the geographical level.
The dashboard above displays two components: a bar graph and a timestream of the top five tweeted posts. The bar graph visualizes the distribution of the count of tweets in descending order. We can see that the data is right skewed, with only five messages that were posted 66 or more times, a significant number relative to all other data points which were tweeted six times or less. To examine these five tweets more closely, we created a timestrem graph where the x and y axes represent the time each individual message was sent and the tweet message, respectively. Interestingly, the message that was tweeted the most was given a positive (green) sentiment score by Vader. As it turns out, this tweet was posted 198 times as an announcement encouraging people to sign a petition in order to save jobs and small businesses. Another significant finding was the fourth most tweeted message, which had a strong negative (red) sentiment score and was only posted for 9 days total (from April 20th-29th). This tweet was strongly opinionated, directed at freezing the pay of Congress and Pelosi for refusing to reopen America. This tweet may suggest a sudden strong outburst of emotion as a response to the government’s actions by a select group of users, but did not last beyond this short time period.
There are two obvious peaks on March 24, 2020 and April 16, 2020. On March 24, a big news event relating to #ReopenAmerica was President Trump giving a warning on the previous day that “Our country was not built to be shut down. We are going to be opening up our country for business because our country was meant to be open (Collinson)." According to the top five most retweeted tweets in March, we could tell that most people were against this idea, as they believed that reopening America would cause numerous deaths resulting from spreading the virus. They considered this idea mainly to save rich people’s money (#NotDying4WallStreet) at the cost of lives - especially the lives of the health care providers.
On April 16, 2020, President Trump told the nation’s governors that they could begin reopening businesses, restaurants, and other elements of daily life by May 1st or earlier if they wanted to (Donald Trump). This time, according to the most retweeted tweets in April, we could tell there was a big shift of people’s attitudes towards #ReopenAmerica. Lots of people shouted out in favor of the reopening because many people had lost their jobs because of the shut down, and people were eager to get back to their normal lives.
For this research project, our team focused on the widely debated issues surrounding COVID-19 and the resulting economic shut down. We analyzed tweets from Twitter users across the United States in terms of average sentiment, geography, and timing of tweet creation. We collected data through a variety of methods and utilized word clouds, data dashboards, bar graphs, and line graphs to illustrate trends in the data we could use to answer our research questions. Through our analyses, we observed that spikes in tweet activity occurred during times of change or groundbreaking information from our government. We also observed that nationwide sentiment of tweets including the #ReopenAmerica hashtag during March 2020 was negative at the aggregated level across most states, with some states being more polarized than others. Since most of the initial shutdowns occurred around March, we saw most people tweeting about them primarily in negative ways. We observed this kind of spike in activity again around the time when President Trump stated that he did not believe America should continue the shutdown. The turmoil surrounding the clashing opinions from the government and the effect of the shutdowns on small businesses ultimately led to a lot of additional negative sentiment regarding the shutdown. Many people did not like how Trump wanted to reopen America because it could lead to the deaths of many people, but another perspective on the matter was that smaller businesses were suffering from major financial losses that occasionally couldn’t be recovered from. The most positive tweets were about signing petitions relating to reopening businesses to help keep them from shutting down. It was fairly clear that many people were affected by the shutdown differently, and that certain states with economies that were more negatively affected by the shutdown had more negative perspectives about the economic shutdown in their tweets. In addition to our own conclusions, we found through two articles that each state has different state taxes, which could have also contributed to the different sentiments in each state. Also, depending on the kinds of jobs available within the state, the pandemic has affected people very differently (Badger). In states where there are more working class jobs in comparison to states with more corporate jobs, we see a disproportionate effect of the pandemic (Fowers). Overall, it was clear that the data collected from these Tweets showed how there was a variety of clashing opinions surrounding COVID-19 and the shutdown caused because of it.
Reopen America
@COVID-19
Word cloud created with a python library
587
2.9k
740
-
March 17
Reopen America
Dashboard created in Tableau
-
March 17
@COVID-19
Reopen America
Geo location tracking map created in Tableau
-
March 17
@COVID-19
587
2.9k
740
Reopen America
-
March 17
@COVID-19
Dashboard created in Tableau
Reopen America
-
March 17
@COVID-19
Line graph and dashboard created in Tableau
Throughout the course of the project, we scheduled biweekly virtual meetings outside of class to ensure that assignments were completed before each deadline. Within our team of six people, we were able to divide up roles in a clear manner. Kaushal Rao, Wanxin Xie, and Michelle Lee worked on sourcing, cleaning, visualizing, and interpreting the data with Python and Tableau, Keven Michel and Smayra Ramesh helped shape the narrative for the project, and Mauricio Gutierrez assembled all the various components onto our website.
To ensure proper communication between all members, we utilized a GroupMe chat to discuss any questions or issues related to the project. We also had a shared Google Drive to keep track of group notes, project assignments, and the spreadsheets of data that we collaborated on. At the end of each team meeting, we delegated tasks to ensure that everyone had something to work on each week and that none of our contributions were redundant to each other. Luckily, we did not have many issues with lack of communication, as everyone in the group was enthusiastic to complete the project and responded to messages in a timely manner.
Week 2
-
Meeting
-
Team Assignment: Twitter Research Questions
-
Data cleaning
-
Week 3
-
Progress check
-
Visualization creation
-
Week 4
-
Progress check
-
Website content
-
Week 5
-
Progress check
-
Website content
-
Visualization creation
-
Week 6
-
Meeting
-
Checkpoint Prep (3 min)
-
Week 7
-
Progress check
-
Construction of website
-
Week 8
-
Progress check
-
Finalize website
-
Proofread of everything
-
Week 9
-
Meeting
-
In-class Practice Presentation Prep (3 min)
-
Week 10
-
Meeting
-
In-class Presentation Prep (5 min)
-