Abusive Language Detection using Machine Learning Test phase, srs, design phase and source code final deliverable

Abusive Language Detection using Machine Learning Test phase, srs, design phase and source code final deliverable

/ Category

Data Science/Machine Learning/Web Programming

Abstract / Introduction

Abusive language is an expression that contains abusive or dirty words in conversation. With the rise of social media culture, there are millions of comments on posts uploaded every day, which has also led to a rapid increase in the use of offensive language in user comments. Abusive language in online comments initiates cyberbullying that targets individuals (celebrities and politicians, etc.) and groups of people (certain countries, ages, and religions). Therefore, it is important to analyze and detect abusive language from online comments automatically. The admin (student) will develop a system to detect abusive language and find accuracy by applying appropriate machine learning techniques (such as Support Vector Machine, Bayes, Tree and Random, etc.) to abusive language comment datasets. The system will also compare which techniques are best for detecting abusive language and why.

Functional Requirements:

Admin (Student) will perform all these (Functional Requirements) tasks.

  1. Data-Collection
    • Collect data from any social media platform (such as Facebook, Twitter, Instagram or YouTube) to detect abusive language. Dataset must contain at least 4000 comments. The data set is shared in the link below for the idea.
  2. Pre-processing
    • As most of the data in the real world are incomplete containing noisy and missing values. So apply pre-processing to the data. In pre-processing, admin will normalize the data set, handle stop words, missing values, and noise & outliers, and remove duplicate values.
  3. Feature Extraction
    • Apply feature extraction method (Frequency – Inverse Document Frequency (TF-IDF), Uni-Gram (1-Gram), Bi-Grams (2-Grams), Tri-Grams (3-Grams), or N-Grams feature extraction method).
  4. Train & Test Data
    • Split data into 70% training and 30% testing data sets.
  5. Machine learning Techniques
    • Apply at least three classifiers/models (e.g. Naïve Bayes, Naïve Bayes Multinomial, Poly Kernel, RBF Kernel, Decision Tree, Random Tree or Random Forest Tree etc.) of three different machine learning techniques/algorithms.
  6. Confusion Matrix
    • Create a confusion matrix table to describe the performance of a classification model.
  7. Accuracy Evaluation
    • Find the accuracy of all techniques and compare their accuracy.
    • This project will also tell us which machine learning technique is better to detect abusive language.

Tools:

  • Anaconda (Python distribution platform)
  • Jupiter Notebook (Open source web application)
  • Python (programming language)
  • Machine Learning (Technique)

Prerequisite:

Artificial Intelligence, Machine Learning, and Natural Language Processing Concepts,

“Admin (student) s will cover a short course relevant to the mentioned concepts besides SRS and

Design initial documentation or see the links below.”

Helping Material

Python

https://www.python.org/ https://www.w3schools.com/python/ https://www.tutorialspoint.com/python/index.htm

Feature Extraction Method: https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be https://www.analyticsvidhya.com/blog/2021/04/guide-for-feature-extraction-techniques/ https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-realworld-dataset-796d339a4089

https://www.analyticsvidhya.com/blog/2021/07/feature-extraction-and-embeddings-in-nlp-abeginners-guide-to-understand-natural-language-processing/ http://uc-r.github.io/creating-text-features

Machine Learning Techniques:

https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0 https://towardsdatascience.com/top-10-algorithms-for-machine-learning-beginners-

149374935f3c

https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-shouldknow-3cc96e0eeee9

https://towardsdatascience.com/machine-learning-classifiers-a5cc4e1b0623

https://www.youtube.com/watch?v=fG4e4TUrJ3E https://www.youtube.com/watch?v=7eh4d6sabA0

Dataset:

https://drive.google.com/file/d/1Jq62ErAQiMpWfEz9_DwSkjmyYdmwWWu6/view?usp=sharing

Supervisor:

Name: Tayyab Waqar

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×

Hello!

Click one of our contacts below to chat on WhatsApp

× WhatsApp Us