Hate Speech Detection

0 0
  • 0 Collaborators

The prevalence of online hate speech and harassment is a significant social issue with potentially severe consequences for individuals and groups. Machine learning has the potential to aid in combating this problem by analysing large amounts of data to identify abusive behaviour patterns. ...learn more

Project status: Concept

oneAPI, Artificial Intelligence, Cloud

Intel Technologies
oneAPI, DevCloud

Docs/PDFs [1]Code Samples [1]

Overview / Usage

The prevalence of online hate speech and harassment is a significant social issue with potentially severe consequences for individuals and groups. Machine learning has the potential to aid in combating this problem by analysing large amounts of data to identify abusive behaviour patterns. Machine learning techniques can be used to automatically moderate content, detect harmful sentiment, analyse user data to identify users likely to engage in abusive behaviour, identify abusive language patterns, and provide automated responses to support those who have been targeted. By doing so, machine learning can help identify and prevent hate speech and harassment, making the online world a safer and more inclusive place for everyone.

Methodology / Approach

  1. Dataset analysis: The project starts with analyzing the CrowdFlower dataset to gain insights into the data and identify any issues or challenges.
  2. Text pre-processing: The raw text data is processed using various techniques such as tokenization, stopword removal, stemming, and removal of URLs and mentions.
  3. Feature engineering: The project extracts unique and important features such as n-gram tf-idf weights, sentiment polarity scores, doc2vec vector columns, and other readability scores. Different combinations of features are created to evaluate their impact on the classification models' performance.
  4. Model building and evaluation: The project uses various machine learning algorithms such as logistic regression, random forest, Naïve Bayes, and SVM to build classification models. The models are evaluated based on their accuracy and f1-scores.
  5. Result analysis: The project analyzes the classification models' results and identifies the most significant features for better classification performance. The project also explains the reasons for misclassifications in the models.

Technologies Used

In terms of technologies used, the project uses Python programming language and several libraries such as NLTK, Scikit-learn, Pandas, Matplotlib, and Seaborn for data analysis, text pre-processing, feature extraction, and model building. These libraries provide useful functions and tools for implementing NLP techniques and machine learning algorithms. The project also uses Jupyter Notebook for code development and presentation of the results

Documents and Presentations

Repository

https://github.com/Aaryan-2/Hackathon

Comments (0)