Fada - Social Computing

Open-source corpora and models for analyzing discourse on social media platforms and LLMs:

Cyberbully, Hate speech, Bias, Propaganda, AI ethics, and more.

  • Dataset and fine-tuned BERT models. The dataset consists of 16,000 tweets in Hebrew, each labeled with: (1) type of offense (Hate, Abusive, Racism and Violence, Pornographic), (2) the target of the offense, (3) the expressions used in the offense, and (4) the topic or reason for the offense.

  • A corpora of 12,000 Facebook posts in five languages (Arabic, Hebrew, English, French, Hindi), with 2,400 posts in each language, manually annotated with Bias and Propaganda. This dataset was collected during the Israeli War on Gaza from October 7, 2023, to January 31, 2024.

  • A dataset consisting of 1,800 pairs of ChatGPT responses was created to analyze potential biases related to Palestine and Israel. The dataset encompasses the 30 articles of international human rights law, about 60 pairs for each article. Each pair was manually classified into one of three categories (Biased against Palestine, Biased against Israel, No Bias) by 12 well-trained law master’s students.