Detection of Cyberbullying using Machine Learning Test phase, srs, design phase and source code final deliverable
Project Domain/ Category
Data Science/ Machine Learning
Abstract/ Preface
Cyber bullying is bullying that occurs with digital bias similar as computers, cell phones, and tablets. It can be through online social media forums where people can view, share, note or partake other people’s content. This may include participating particular or private information about someone differently that may beget embarrassment, similar as transferring, posting or participating negative, inaccurate, dangerous material about someone differently. As social networks give a rich terrain for bullies to use these networks as a trouble to attacks against victims, thus, it’s important to find applicable measures to descry cyberbullying from social media. In this design, we shall find the delicacy by applying applicable machine literacy ways (e.g. Bayesian, Support Vector Machine, Tree and Random,etc.) on cyber bullying datasets. We shall also compare what ways are better for detecting cyberbullying and why.
Functional Conditions
Director will perform all these tasks.
1. Data-Collection
• For this design, you can collect data from any social media platform ( similar as Facebook, Twitter, or YouTube) to descry cyber bullying. Your dataset must contain at least 2000 commentary. The dataset is participated in the link below. You can collect further data using the API or manually, and add the collected data to the participated dataset.
2. Data-Preparation
• After collecting the data, you need to prepare the dataset. In the process, you’ll label these commentary in two classes B (bullying) and NB (non-bullying). You’ll also need to remove punctuation marks and integers from the dataset.
3. Pre-processing
• As utmost of the data in the real world are deficient containing noisy and missing values. Thus you have to applypre-processing on your data. Inpre-processing, you’ll homogenize the dataset, remove indistinguishable values, handle noise & outliers, missing values, and stop words.
4. Point Birth
• After thepre-processing step, you’ll apply the point birth system. You can use TF-IDF, Word2Vec,Uni-Gram,Bi-Gram,Tri-Gram, or Ngram feature birth system.
5. Train & Test Data
• Split data into 70 training & 30 testing data sets.
6. Machine literacy Ways
• In this design, you’ll use minimal four classifiers/ models (e.g. Naïve Bayes, Naïve Bayes MN, Poly Kernel, RBF Kernel, Decision Tree, Random Tree and Random Forest Tree) of four machine literacy ways/ algorithms.
7. Confusion Matrix
• Produce a confusion matrix table to describe the performance of a bracket model.
8. Delicacy Evaluation
• Find the delicacy of all ways and compare their delicacy.
• This design will also tell us which machine literacy fashion is stylish for detecting cyber bullying.
Tools/ Ways
• Python (programming language)
• Anaconda (Python distribution platform)
• Jupiter Notebook (Open source web operation)
• Machine Literacy ( Fashion)
Prerequisite
Artificial Intelligence, Machine Learning, and Natural Language Processing Generalities,
.” Scholars will cover a short course applicable to the mentioned generalities besides SRS and Design original attestation or see the links below.”
Helping Material
Machine Learning Ways
https//towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0 https//towardsdatascience.com/top-10-algorithms-for-machine-learning-beginners-
149374935f3c
https//towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-shouldknow-3cc96e0eeee9 https//towardsdatascience.com/machine-learning-classifiers-a5cc4e1b0623
Feature Birth System https//towardsdatascience.com/feature-extraction-techniques-d619b56e31be https//www.analyticsvidhya.com/blog/2021/04/guide-for-feature-extraction-techniques/ https//towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-realworld-dataset-796d339a4089
https//www.analyticsvidhya.com/blog/2021/07/feature-extraction-and-embeddings-in-nlp-abeginners-guide-to-understand-natural-language-processing/ http// uc-r.github.io/ creating- textbook-features
Dataset
https// drive.google.com/file/d/1AfUdn70MfnFirnb7NTu2DTS1AVasnofG/view?usp=sharing