Inspired by the shocking hatred and social stigma towards
Muslims portrayed in Kamila Shamsie's
Home Fire, we set out to
analyze Islamophobia in a real-world context.
What
better place to turn to
Twitter &
Parler, a cess-pool of hate?
To analyze trends at scale, we built and trained an ensemble of machine learning models to classify Islamophobia. Put simply, processing natural language involves representing words as mathematical vectors used to calculate a final output.
Take the term "Muslims are (not) Terrorists" for example. It could be represented as:
To find suitable representations & relationships between words (weights), pre-labeled training data is required. Coupled with some linear algebra, the most optimal representation of words can be computed get a desired output. It's akin to finding a line of best fit.
To build a dataset suitable for training the model, we hand-labeled over 12,000 Twitter & Parler posts, categorizing each post into one of three categories:
None - No Islamophobic content.
General - Generalized Islamophobic hate speech
"Muslims in #Congress are taking over our governments, Keep them
out!"
"The 2020 Election was Fraudulent and rigged by
muhammad followers! #maga"
"You're a Muslim go marry your
brother"
"There's 39,000 extreme muslims in Britain atm. Over
3,000 are watched 24/7 Believe it needs 70 security personnel for
each one. Costs hundreds of billions per year. "
Terrorism - Hate speech associating Islam/Muslims with terrorism or violent activities
"Muslims are Radical Islamic terrorists"
"We need to get
these violent criminals out of our government. #Muslims"
"The
Quran incites violence against the good people of our
republic."
"Not all, but a significant amount of terrorism in
the world these days is due to Islamic teachings and principles.
Let's adhere to the truth and accept there is a problem."
Despite our initial assumptions, it turns out understanding hate speech is a lot harder than it sounds, especially given that many tweets happen without context.
Take the following:
It's not explicitly calling Muslims terrorists, or associating Islam itself with an negative values. Yet, as Shamsie writes, "terrorists were never described by the media as 'British Terrorists'...always something interposed between their Britishness and terrorism." Therefore, we interpreted the conscious decision to name them as Muslim, instead of another label, as stigmatizing the Muslim population, reinforcing untrue stereotypes, and therefore Islamophobic.
After labeling the dataset, we trained & tuned ~ 30 variations of models across 12 hours, acheiving 93% 3-fold cross-validated accuracy on a validation dataset. We picked three of the best performing models, trained on slightly varied data and merged the probability outputs from each model with a logistic regression classifier.
Yet, 93% isn't perfect.
We noticed several weakness in the models. These include:
Results are varied when it comes to multiple layers of
referential statements, such as:
"Islam is a bad religion. Some people agree with this
statement (referring to the 1st statement). Others do not
agree (referring to the 1st statement). I'm in agreement with
the latter (referring to the 3rd statement), and those in the
former group are wrong (referring to the 2nd statement). "
Using our model, we analyzed 939,908 tweets containing the word "Muslim" from September 2nd, 2012 to December 4th, 2021. Here's what we found.
We zoomed in on data for several major terrorism incidents (2015 Paris Attacks, 2016 Brussels Bombings, 2021 Afganistan + Liverpool attacks) and calculated the % of tweets that contained either general ot terrorism-related Islamophobic (Formula: tweets tagged as Islamophobic ÷ total tweets). Contrary to what we initially hypothesized, we found no correlation between such incidents and increases in hate speech. (Below: 2 Day M.A, 30 Day Interval)
We did, however, find a downward trend of % of Islamphobic tweets that were terrorism related from 2012 to 2021 (Formula: tweets tagged as Terrorism ÷ tweets tagged as Islamophobic ). Overall, there was a ~20% reduction in terrorism-related Islamophobia across the 9 years. (Right: Sampled 365 Day M.A., 9 Year Interval)
We also noted that Trump's presidency was strongly correlated with a sharp uptick in Islamophobic tweets. Major political events seem also to have varying forms of impact as expected. It's worth pointing out that correlation is not necessarily causation.
Besides time-based trends, we also look at the % of tweets classified as hate speech for a given keyword. We analyzed an additional 131,394 tweets (~5000 for each keyword) across 48 hours and charted the average percentage (Formula: tweets tagged as Islamophobic ÷ total tweets) for each keyword.
Neutral: Absence of Islamophobic hate speech
General: Non-specific (cultural stereotypes, insults, slander, etc) Islamophobic hate speech
Terrorism: Hate speech involving the association of Islam with terrorism
Analysis is run on three language models trained on slightly varied data. Results are aggregated by a weighted classifier based on the probabilities from each model.
Models | Result |
---|---|
Neutral | |
General | |
Terrorism |
Models | Result |
---|---|
Neutral | |
General | |
Terrorism |
Models | Result |
---|---|
Neutral | |
General | |
Terrorism |