Classification of Research Articles Using NLP and Machine Learning Test phase, srs, design phase and source code final deliverable

/ Category

Artificial Intelligence + Desktop/Web Based Application

Abstract / Introduction

Document classification is one of the most challenging problems in machine learning in which an algorithm categorizes the document into different classes, in order to make them easier to manage, search, filter, and analyze. Generally, the document classification task is divided into text and visual classification. Text classification concerns defining the type, genre, or theme of the document based on its context which can be achieved the Natural Language Processing (NLP).

In this project, students will classify the category of research articles based on Title and Abstract using NLP and machine learning techniques. The link from where the dataset can be downloaded is given under the dataset heading, students can download it by visiting the link.

Details of the functional requirements are given below.

The system will consist of Two Modules. Each module will have its own set of requirements. In order to complete this project students are required to complete all the requirements of each module.

Note: To select this project, the students must know the pre-requisite required for the project selection. If any student wants to select this project without any prior basic knowledge of Artificial Intelligence or Machine Learning, he/she must complete the resources and the tutorials mentioned in the pre-requisite section in parallel to SRS and Design Document Submission.

Functional Requirements:

Module 1:

The system will first Import the dataset.
The system will display the summary statistics, trends, patterns, and insights on the data visually by performing the EDA (Exploratory Data Analysis).
After performing the EDA system will preprocess the data.
The system will Split the data into train and test.
The system will use a Supervised based Machine and deep learning algorithm to train the data.
- The system needs to use any 4 machine learning algorithms as per his/her choice for training.
- The system needs to use any 2 deep learning algorithms as per his/her choice for training (optional).
After the training process ends, the system will evaluate the trained model on the test data.
The system will save the model for future use.

Module 2:

The system will provide the user with an Interface window. Students can create the interface window in any GUI Python Library or a Web page using any Python web framework like Django, Flask, FastApi, StreamLite, etc.
The system will integrate the trained model from module 1 into module 2.
The interface should provide the user an option to interact with the system, by first entering, and predicting the category of research article.

Pre-requisite:

Students need to have an understanding of the CS607 (Artificial Intelligence) course.
Students also need to have a basic understanding of the Machine learning techniques.
Students can learn Machine learning and Artificial Intelligence techniques from the following links:
- https://www.javatpoint.com/machine-learning
- https://ocw.vu.edu.pk/Videos.aspx?cat=Computer+Science%2fInformation+Technol ogy+&course=CS607
- https://www.youtube.com/watch?v=_u-PaJCpwiU&list=PLu0W_9lII9ai6fAMHpacBmJONT7Y4BSG&index=1
- https://vulms.vu.edu.pk/Courses/CS607/Downloads/AI_Complete_handouts_for_Pr pdf

Dataset: https://www.kaggle.com/datasets/shivanandmn/multilabel-classification-dataset

Python Tutorials:

EDA Tutorials:

https://www.analyticsvidhya.com/blog/2021/05/exploratory-data-analysis-eda-a-step-bystep guide/#:~:text=EDA%20is%20the%20process%20of,to%20understand%20the%20data%20b etter.
https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/

Machine Learning Tutorials:

NLP Tutorials:

Deep learning Tutorials:

Tools:

Language: Python (Only python language)

Framework: Anaconda, Tkinter, PyQt5, Django, Flask, etc.

IDE: JupyterNotebook, Colab, Pycharm, Spyder, Visual Studio Code, etc.

Supervisor:

Name: Saad Ahmed

Leave a Comment Cancel Reply