Fada - Social computing tools and datasets

Fada - Social Computing

Open-source corpora and models for analyzing discourse on social media platforms and LLMs:

Cyberbully, Hate speech, Bias, Propaganda, AI ethics, and more.

+ -

Offensive Language detection in Hebrew

اكتشاف خطاب الكراهية بالعبرية

Dataset and fine-tuned BERT models. The dataset consists of 16,000 tweets in Hebrew, each labeled with: (1) type of offense (Hate, Abusive, Racism and Violence, Pornographic), (2) the target of the offense, (3) the expressions used in the offense, and (4) the topic or reason for the offense.

Download
Github (Data and Code)

Download
HuggingFace (Bert Models)

Read more
Article
+ -

Bias and propaganda detection in social media

اكتشاف التحيز والبروباغندا بخمس لغات

A corpora of 12,000 Facebook posts in five languages (Arabic, Hebrew, English, French, Hindi), with 2,400 posts in each language, manually annotated with Bias and Propaganda. This dataset was collected during the Israeli War on Gaza from October 7, 2023, to January 31, 2024.

Download
Github

Read more
Article
+ -

Benchmark for detecting bias in LLMs

اكتشاف تحيز النماذج اللغوية الضخمة

A dataset consisting of 1,800 pairs of ChatGPT responses was created to analyze potential biases related to Palestine and Israel. The dataset encompasses the 30 articles of international human rights law, about 60 pairs for each article. Each pair was manually classified into one of three categories (Biased against Palestine, Biased against Israel, No Bias) by 12 well-trained law master’s students.

Coming Soon