Abusive Language Detection using Machine Learning Test phase, srs, design phase and source code final deliverable
/ Category
Data Science/Machine Learning/Web Programming
Abstract / Introduction
Abusive language is an expression that contains abusive or dirty words in conversation. With the rise of social media culture, there are millions of comments on posts uploaded every day, which has also led to a rapid increase in the use of offensive language in user comments. Abusive language in online comments initiates cyberbullying that targets individuals (celebrities and politicians, etc.) and groups of people (certain countries, ages, and religions). Therefore, it is important to analyze and detect abusive language from online comments automatically. The admin (student) will develop a system to detect abusive language and find accuracy by applying appropriate machine learning techniques (such as Support Vector Machine, Bayes, Tree and Random, etc.) to abusive language comment datasets. The system will also compare which techniques are best for detecting abusive language and why.
Functional Requirements:
Admin (Student) will perform all these (Functional Requirements) tasks.
- Data-Collection
- Collect data from any social media platform (such as Facebook, Twitter, Instagram or YouTube) to detect abusive language. Dataset must contain at least 4000 comments. The data set is shared in the link below for the idea.
- Pre-processing
- As most of the data in the real world are incomplete containing noisy and missing values. So apply pre-processing to the data. In pre-processing, admin will normalize the data set, handle stop words, missing values, and noise & outliers, and remove duplicate values.
- Feature Extraction
- Apply feature extraction method (Frequency – Inverse Document Frequency (TF-IDF), Uni-Gram (1-Gram), Bi-Grams (2-Grams), Tri-Grams (3-Grams), or N-Grams feature extraction method).
- Train & Test Data
- Split data into 70% training and 30% testing data sets.
- Machine learning Techniques
- Apply at least three classifiers/models (e.g. Naïve Bayes, Naïve Bayes Multinomial, Poly Kernel, RBF Kernel, Decision Tree, Random Tree or Random Forest Tree etc.) of three different machine learning techniques/algorithms.
- Confusion Matrix
- Create a confusion matrix table to describe the performance of a classification model.
- Accuracy Evaluation
- Find the accuracy of all techniques and compare their accuracy.
- This project will also tell us which machine learning technique is better to detect abusive language.
Tools:
- Anaconda (Python distribution platform)
- Jupiter Notebook (Open source web application)
- Python (programming language)
- Machine Learning (Technique)
Prerequisite:
Artificial Intelligence, Machine Learning, and Natural Language Processing Concepts,
“Admin (student) s will cover a short course relevant to the mentioned concepts besides SRS and
Design initial documentation or see the links below.”
Helping Material
Python
https://www.python.org/ https://www.w3schools.com/python/ https://www.tutorialspoint.com/python/index.htm
Feature Extraction Method: https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be https://www.analyticsvidhya.com/blog/2021/04/guide-for-feature-extraction-techniques/ https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-realworld-dataset-796d339a4089
https://www.analyticsvidhya.com/blog/2021/07/feature-extraction-and-embeddings-in-nlp-abeginners-guide-to-understand-natural-language-processing/ http://uc-r.github.io/creating-text-features
Machine Learning Techniques:
https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0 https://towardsdatascience.com/top-10-algorithms-for-machine-learning-beginners-
149374935f3c
https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-shouldknow-3cc96e0eeee9
https://towardsdatascience.com/machine-learning-classifiers-a5cc4e1b0623
https://www.youtube.com/watch?v=fG4e4TUrJ3E https://www.youtube.com/watch?v=7eh4d6sabA0
Dataset:
https://drive.google.com/file/d/1Jq62ErAQiMpWfEz9_DwSkjmyYdmwWWu6/view?usp=sharing
Supervisor:
Name: Tayyab Waqar