Email Spam and Non-Spam Classification Test phase, srs, design phase and source code final deliverable
Project Domain / Category
Data Science/Machine Learning
Abstract / Introduction
Email becomes a powerful device for verbal exchange because it saves quite a few time and cost. It is one of the most famous and relaxed medium for on-line transferring and verbal exchange messages or data thru the net. But, due to the social networks, maximum of the emails comprise undesirable statistics that is known as unsolicited mail. To become aware of such spam email is one of the essential demanding situations.
In this undertaking we will use PYTHON text classification technique to identify or classify e-mail junk mail message. We will locate accuracy, time and errors charge by way of making use of appropriate algorithms (which include NaiveBayes, NaiveBayesMultinomial and J48 and many others.) on Email Dataset and we are able to also examine which set of rules is fine for text classification.
Functional Requirements:
Administrator will carry out these kind of tasks.
1. Collect Data Set
• Gathering the records for Email unsolicited mail contains junk mail and non-junk mail messages
2. Pre-processing
• As most of the statistics within the real global are incomplete containing noisy and missing values. Therefore we should practice Pre-processing for your information.
3. Feature Selection
• After the pre-processing step, we follow the characteristic selection set of rules, the algorithm which installation right here is Best First Feature Selection set of rules.
Four. Apply Spam Filter Algorithms.
• Handle Data: Load the dataset and cut up it into schooling and take a look at datasets.
• Summarize Data: summarize the residences within the training dataset so that we will calculate probabilities and make predictions.
• Make a Prediction: Use the summaries of the dataset to generate a single prediction.
• Make Predictions: Generate predictions given a check dataset and a summarized schooling dataset.
• Evaluate Accuracy: Evaluate the accuracy of predictions made for a check dataset as the percentage accurate out of all predictions made.
Five. Train & Test Data
• Split statistics into 70% education & 30% testing information sets.
6. Confusion Matrix
• Create a confusion matrix desk to describe the performance of a class version.
7. Accuracy
• Find Accuracy of all algorithm and examine.
Tools:
• Python
• Anaconda
Prerequisite:
Artificial intelligence Concepts, Machine getting to know.
Supervisor:
Name: Muhammad Tayyab Waqar