about the project – hindsightis365

Research question

I seek to understand how sentiment and discourse surrounding candidate Kamala Harris changed from May 1st to November 5th, 2024 through the following sub-research questions:

Why type of political content is most effectively circulated and interacted with on X?
How do hashtags influence the circulation of content on X?
How did the platform’s purchase by Elon Musk in 2023 influence the way that content was circulated?
How closely do the discussions Americans have on social media correlate with their voting behavior?
How can X inform campaign strategy, and what can be learned from 2024 when it comes to future American elections and the way X is used?

Platform selection

Billionaire and technology mogul Elon Musk purchased the platform X in 2022 for $44 Billion, leading to discussion of his influence over the platform by prominent academics studying the use of social media and political campaigns. In the following literature review, Ye, Graham, and Corsi all discuss shifts in X’s algorithms following the consolidation of the platform by Musk. I selected X as the platform best suited for conducting research into the Harris campaign’s social media efforts during the 2024 election to understand how potential algorithmic bias may have influenced the conversation, and to understand what types of messaging were most effective on the platform. Since his purchase of X, Musk has removed certain restrictions on formerly banned content (O’Sullivan 2024), and developed new AI features such as the Grok ai chatbot which allows users to ask questions about political information (O’Sullivan 2024). I wanted to understand how these changes impacted the circulation of information on the platform, especially as it related to the 2024 presidential election. Moreover, X data consisted primarily of text, allowing more insight into the sentiments of users that was available via Instagram or Tiktok, where users communicate via videos and images primarily.

Literature review

In the introduction to his book, “Electoral Campaigns, Media, and the New World of Digital Politics,” author David Taras discusses the ways in which digital media has transformed our modern system of elections. He argues that the goal of every “contender for power” has always been to shift the public’s focus from issues that will benefit their opponents to issues that are favorable to them. “Campaigns are as much about fomenting anger and resentment and constructing the opposition as they are about discussing solutions to problems,” (Taras 2) he writes. In a digital landscape, this can be done far more effectively. Politicians have unlimited access to communicating with the public while they once only had debates and public service announcements, and the “spreadability,” of their ideas is essential, according to Taras. This is because social media allows for them to “focus on narrow appeals and tell a story of ‘being one of ’ or, at the very least, ‘being one with’” those that are being targeted,” (Taras 13). Throughout the course of his book he examines a series of international elections where digital media has played an increasing role, centering his research on its ability to analyze specific demographics of public opinion and tweak personalized messaging. He claims that as identity politics weaponized by digital media have become the norm, it is worth asking whether they exacerbate political divisions or heal them, claiming that the answer may reveal cracks in global democracy and its remedies. It is worth mentioning that Taras is Canadian and the scope of his book is international, so the paper may lack some American relevancy. He passed away in 2022 and the book was published in 2022 and did not see the acquisition of what was formerly Twitter by Elon Musk nor the present state of its use, which pose limitations to its relevance for this paper. However, because its breadth is wide and it is rooted in foundational political and election theory, it provides a strong framework for understanding the impact of digital media on elections globally.

Jinyi Ye, Luca Luceri, Emilio Ferrara at the University of Southern California explore political targeting online in their empirical study, “Auditing Political Exposure Bias: Algorithmic Amplification on Twitter/X Approaching the 2024 U.S. Presidential Election.” Providing a specific glimpse into the way the platform was influenced by Musk himself during the election cycle, they conduct a three-week audit of X’s algorithmic content recommendations using

a set of 120 sock-puppet monitoring accounts that capture tweets in their personalized “For You”

timelines, aiming to understand the type of content that is served to users that they do not specifically select. The authors create four groups: 30 neutral accounts (default setting, following no one), 30 left-leaning accounts, 30 right-leaning accounts, and 30 balanced accounts. Using the Allsides Political Bias Chart, they select accounts for each to follow. The tweets are scraped 4 times per day, and each neutral account receives approximately 500 tweets per session, while each left-leaning, right-leaning, and balanced account receives around 700 tweets per session. They find that while X skews exposure toward a select few high popularity accounts for all users, right-leaning users experience the highest level of inequality. They also found that neutral accounts with no follow activity are served content that is more right leaning. Notably, the authors found that X specifically amplifies the accounts of political celebrities and commentators. This study provides helpful insights into the way that X recommends content to its users, and implies that voters who view political information on social media are more likely than not to be faced with content that replicates feelings and beliefs they already have, created by individuals who profit greatly from disseminating them. If users are only shown ideas they agree with, perhaps the “spreadability” of such ideas is not as important as Taras suggests because the ideas are being fed directly to them. This paper gathers information from a select two week period in October of 2024, following October 7th, the one year anniversary of an attack from rebel organization Hamas against 1200 Israeli residents. This event and its politicized X coverage could possibly skew the data that these researchers uncover, as could other political events at the time, which is a limitation of the study worth noting. Moreover, USC’s HUMANS lab sponsored the team, and has their own API/social media scraper the details of which they do not disclose. More investigation into the specific technology used to gather this data could improve the accuracy of its results.

Timothy Graham and Mark Andrejevic of The Queensland University of Technology in Brisbane Australia published a working paper titled, “A computational analysis of potential algorithmic bias on platform X during the 2024 US election,” in which they conducted an investigation into platform-wide engagement with the accounts of prominent Republican and Democratic users, along with Elon Musk’s account. Selecting five of the most prominent democratic and republican political commentators on X, the authors used an academic API to scrape for as many of the posts they made as possible. They looked for view, retweet, and like counts, using time stamped data to see how engagement varied throughout the day. Ultimately, they found a, “structural break,” on July 13 2024 when Elon Musk’s account showed a substantial increase in view counts, rising by approximately 138.27% compared to his average view count before the change. Across the board, other accounts also experienced an increase in engagement at the same time, averaging 56.93%. Notably, they found that the increase in engagement that came mid July benefitted Republican accounts more so than democratic ones, illustrated by the figure to the left. “Musk’s posts received a marked increase in visibility (view counts), amplification (retweet counts), and user interaction (favourite counts) that outpaced the general engagement trends observed across the platform. The unique post-change uplift for Musk’s account suggests the possibility of algorithmic prioritisation or bias, positioning Musk’s content favourably in terms of platform visibility and user engagement,” (Graham et. al 10) they conclude. Why July 14th? The researchers conclude that even though across metrics the “change point” varies, it all aligns on this date, representing a platform-wide alteration. While the Academic API used by the team has some limitations, and the study was conducted primarily on the largest Democratic and Republican accounts, this data provides a valuable insight into the way that X can be influenced by its CEO and notes significant abnormalities that occurred during the 2024 presidential campaign season. This creates a context for the data the following research will examine, as it occurred during the same time period and draws from data sets selected both before and after July of 2024. w

When looking at engagement metrics on X, one variable that can impact the propensity for users to engage is the use of hashtags. In their study, “Political Hashtags & the Lost Art of Democratic Discourse,” researchers Eugenia Ha Rim Rho and Melissa Mazmanian from the University of California at Irvine find that when compared to those not exposed to hashtags, a group of users who are more likely to use words associated with fear, anger, and disgust in their comments, exhibiting more “black and white rhetoric” and less emotional temperance in their comments. By way of a randomized control experiment, they selected both news articles with hashtags in the title and those without, disseminating a random article to a survey participant who was told to leave a comment. They then invited researchers to examine the online comments across both types of tweets. They found that those shown news posts with hashtags focused on assumed political biases more so than the social issues discussed in the news content. They also found that, “the presence of political hashtags in news posts emerged as a significant predictor of anger, fear, and disgust sentiments across comments responding to news posts” (Rho et. al). “Bringing in the first-person perspective in text rhetorically anchors the discourse to the present moment. Further, employing the first-person language is an effective marker of cognitive processing, perspective-taking, and a heightened sense of authenticity (through the ownership of one’s words). Prefacing arguments with phrases, such as “I think”, commenters from the control group tacitly suggest reflection and consideration of personal values into the reasoning behind their words,” (Rho et.al). The study’s findings demonstrate the significance of hashtags when it comes to the way that users engage with and perceive political information online. In the context of the identity politics Taras cites as a key weapon used on social media to create grouping exploited by politicians, Rho et. al reveal that social media cannot only manipulate users into not only feeling more personally identified with a cause, but using fewer critical thinking and emotionally temperate skills to communicate their political beliefs.

Tweets that heighten emotion are what Giulio Corsi at the University of Cambridge calls, “toxic tweets,” when coupled with a low factual level. In his paper, “Evaluating Twitter’s algorithmic amplification of low-credibility content: an observational study,” he conducts observational research by comparing a dataset of 2.7 million tweets about Covid-19 and Climate Change over a 14 day period in January 2023 by a tweet’s engagement level and a user’s followers level to deem the credibility of the tweet. He also considers toxicity as measured by the Jigsaw’s Perspective API , political bias, and verified status to determine whether the twitter algorithm favors the visibility of low-credibility or high-credibility content. He finds that, “tweets containing low-credibility URL domains perform better than tweets that do not across both datasets. Furthermore, high toxicity tweets and those with right-leaning bias see heightened amplification, as do low-credibility tweets from verified accounts,” (Corsi 1). In the context of the literature examined, his work provides evidence that there is a large spread of information online that may not be credible, but is favored by X users. He specifically found that high toxicity content is favored by engagement based recommendation systems. This provides insight into the context of the algorithm that the tweets scraped for the research this paper presents are circulated within, and may explain why the syntax or source of some tweets is more popular than others.

The literature on the rise of digital media and specifically the political influence of X demonstrates that not only is it a strong tool when it comes to the “spreadability” of ideas online (Taras 2022), but it creates a sense of identity politics that are amplified by the polarization of information that users are exposed to. The shift of increased engagement of Musk’s own platform and Republican-dominated political content in July (Graham et. al) is an especially important factor to consider when analyzing the data that this research has collected, as is a shift towards “high toxicity content,” (Corsi). The following analysis seeks to fill gaps in the literature by addressing how the nature of X impacted the outcome of the 2024 election and the online discourse surrounding presidential candidate Kamala Harris specifically. Do the trends in political content on X stay there, or do they have implications that influence voters at the ballot box as well?

about the Data

Public dataset

There are multiple datasets used for this project. The primary dataset comes from Cornell University’s arXiv, a free distribution service and open-access archive for data and information. The dataset used for this project was created by Ashwin Balasubramanian, Vito Zou, Hitesh Narayana, Christina You, Luca Luceri, and Emilio Ferrara, and is composed of 22 million publicly available posts from X users between May and August of 2024 discussing the 2024 election. Balasubramanian et. al employed targeted keywords linked to key political figures, events, and emerging issues, to capture evolving public sentiment for the election on X. The researchers analyzed their findings after August 1st, however there is data available through August 27th. To clean the data, I used Python III in Google Collaboratory, and removed all NA values, and scraped specifically for just the text of the tweet, reply and retweet count, and hashtags of the tweet. The date of the tweet was blank for all values, as was the location data. I did not factor in location or exact date of the data, though this is something that would have improved the accuracy of my research. Instead, I focus on the overall sentiment for two major time frames, the month of May and the month of August.

API Scraped data

I additionally used the API, “Easy Twitter Search Scraper,” available on APIfy.com, to create a secondary dataset of 98 tweets that contained the keyword “harris” on November 5th, the day of the presidential election. This dataset contains the name of the user, text of the tweet, as well as the amount of replies, likes, and retweets it received.

AP NORC data

I additionally used data from the Associated Press in collaboration with the NORC center at the University of Chicago to collect exit polling data on those who voted in the 2024 election. It surveyed more than 120,000 voters across the U.S. from October 28 to November 5 and recorded their age, education level, race or ethnicity, location, and gender.

Pew research data

I used a 2023 survey from the Pew research center titled, “Americans’ views of and experiences with activism on social media.” This data was collected from May 15th to May 23rd, 2023 as part of a Pew report, “#Blacklivesmatter Turns 10.” It was collected from 5,101 US adults.

I used a second Pew survey published on June 12, 2024 titled, “How X users view, experience the platform.” This is a survey of 10,287 adult internet users in the United States from March 18 to 24, 2024 as part of a Pew report titled, “How Americans Navigate

Politics on TikTok, X, Facebook and Instagram.” Users were asked varying questions about their sentiment for different online social media platforms.

All in Together data

This dataset was created by the national nonpartisan nonprofit All in Together, which seeks to mobilize female voters. They lead public and community leadership programs, especially those targeted at getting Black female voters engaged. The dataset was created by scraping X, Twitter, and Instagram using their “political discourse tracker” which, “identifies and quantifies the number of unique public social media attacks on candidates based on racist or sexist language with data sourced via Infegy Atlas consumer intelligence platform that analyzes text from online conversations.” Their “fight back ratio” scrapes for the volume of posts that push back against racist or sexist content or foster positive conversations about the race and gender of the candidate. The data was collected between September 23rd 2024 and September 29th, 2024.