Islamophobia in Social Media Analysis

By Anshul Gupta, Welton Wang, and Timothy Trick

Scroll

#PervyPasha

#GoBackWhereYouCameFrom

#DontSullyOurSoil

Inspired by the shocking hatred and social stigma towards Muslims portrayed in Kamila Shamsie's Home Fire, we set out to analyze Islamophobia in a real-world context.

What better place to turn to Twitter & Parler, a cess-pool of hate?

Machine Learning

To analyze trends at scale, we built and trained an ensemble of machine learning models to classify Islamophobia. Put simply, processing natural language involves representing words as mathematical vectors used to calculate a final output.

Take the term "Muslims are (not) Terrorists" for example. It could be represented as:

Training

To find suitable representations & relationships between words (weights), pre-labeled training data is required. Coupled with some linear algebra, the most optimal representation of words can be computed get a desired output. It's akin to finding a line of best fit.

The Dataset

To build a dataset suitable for training the model, we hand-labeled over 12,000 Twitter & Parler posts, categorizing each post into one of three categories:

None - No Islamophobic content.

Examples

General - Generalized Islamophobic hate speech

Examples

Terrorism - Hate speech associating Islam/Muslims with terrorism or violent activities

Examples

Language is Nuanced

Despite our initial assumptions, it turns out understanding hate speech is a lot harder than it sounds, especially given that many tweets happen without context.

Take the following:

Implicit

Slander

Vague

Implicit: "Three terrorists bombers of the station were Muslim."

It's not explicitly calling Muslims terrorists, or associating Islam itself with an negative values. Yet, as Shamsie writes, "terrorists were never described by the media as 'British Terrorists'...always something interposed between their Britishness and terrorism." Therefore, we interpreted the conscious decision to name them as Muslim, instead of another label, as stigmatizing the Muslim population, reinforcing untrue stereotypes, and therefore Islamophobic.

The Model

After labeling the dataset, we trained & tuned ~ 30 variations of models across 12 hours, acheiving 93% 3-fold cross-validated accuracy on a validation dataset. We picked three of the best performing models, trained on slightly varied data and merged the probability outputs from each model with a logistic regression classifier.

Yet, 93% isn't perfect.

We noticed several weakness in the models. These include:

Referentials

Negatives

Quotations

Text Variations

Linguistic Style

Complex Referentials

Results are varied when it comes to multiple layers of referential statements, such as:
"Islam is a bad religion. Some people agree with this statement (referring to the 1st statement). Others do not agree (referring to the 1st statement). I'm in agreement with the latter (referring to the 3rd statement), and those in the former group are wrong (referring to the 2nd statement). "

Data Analysis

Using our model, we analyzed 939,908 tweets containing the word "Muslim" from September 2nd, 2012 to December 4th, 2021. Here's what we found.

No correlation between terrorism incidents and % of Islamophobic tweets

We zoomed in on data for several major terrorism incidents (2015 Paris Attacks, 2016 Brussels Bombings, 2021 Afganistan + Liverpool attacks) and calculated the % of tweets that contained either general ot terrorism-related Islamophobic (Formula: tweets tagged as Islamophobic ÷ total tweets). Contrary to what we initially hypothesized, we found no correlation between such incidents and increases in hate speech. (Below: 2 Day M.A, 30 Day Interval)

Decrease in Terrorism Related Islamophobic Speech

We did, however, find a downward trend of % of Islamphobic tweets that were terrorism related from 2012 to 2021 (Formula: tweets tagged as Terrorism ÷ tweets tagged as Islamophobic ). Overall, there was a ~20% reduction in terrorism-related Islamophobia across the 9 years. (Right: Sampled 365 Day M.A., 9 Year Interval)

Impact of Political Environment

We also noted that Trump's presidency was strongly correlated with a sharp uptick in Islamophobic tweets. Major political events seem also to have varying forms of impact as expected. It's worth pointing out that correlation is not necessarily causation.

Keyword to Hate Correlation

Besides time-based trends, we also look at the % of tweets classified as hate speech for a given keyword. We analyzed an additional 131,394 tweets (~5000 for each keyword) across 48 hours and charted the average percentage (Formula: tweets tagged as Islamophobic ÷ total tweets) for each keyword.

Input Text/Tweet

Neutral: Absence of Islamophobic hate speech

General: Non-specific (cultural stereotypes, insults, slander, etc) Islamophobic hate speech

Terrorism: Hate speech involving the association of Islam with terrorism

Analysis is run on three language models trained on slightly varied data. Results are aggregated by a weighted classifier based on the probabilities from each model.

Result

Model 1
Models	Result
Neutral
General
Terrorism

Model 2
Models	Result
Neutral
General
Terrorism

Model 3
Models	Result
Neutral
General
Terrorism

Islamophobia in Social Media Analysis

#PervyPasha

#GoBackWhereYouCameFrom

#DontSullyOurSoil

Machine Learning

Training

The Dataset

Language is Nuanced

Implicit: "Three terrorists bombers of the station were Muslim."

Slander: "Obama is Muslim" vs "John Doe is Muslim"

Vague: "Is he a Muslim?"

The Model

Complex Referentials

Triple Negatives

Quotations

Hashtags, Slang, Text Variations

Language Style

Data Analysis

No correlation between terrorism incidents and % of Islamophobic tweets

Decrease in Terrorism Related Islamophobic Speech

Impact of Political Environment

Keyword to Hate Correlation

Input Text/Tweet

Result