Article-Summarizer-Using-AI

An AI-based web application that provides concise summaries of articles using advanced natural language processing (NLP) techniques.

Introduction

Article-Summarizer-Using-AI is a web application designed to summarize lengthy articles using NLP. The application allows users to upload their own articles or use sample data to generate summaries in various styles, utilizing a generative AI model.

Data Exploration

Dataset

The dataset used for training and evaluation is the PubMed Summarization dataset. It includes articles from PubMed with corresponding abstracts used as summaries.

Loading the Dataset:

from datasets import load_dataset

pubmed_data = load_dataset("ccdv/pubmed-summarization", split='train[:1000]')

Initial Data Cleaning:

Remove rows with missing values to ensure data quality.

pubmed_data = pubmed_data.filter(lambda x: x['article'] is not None and x['abstract'] is not None)

Exploratory Data Analysis:
- Examine the distribution of article lengths and summary lengths.
- Identify common topics and terminology within the dataset.
```
print(pubmed_data[0])  # View the first data entry
```

Model Selection

Preprocessing

Text Tokenization:

Split text into sentences and words for detailed analysis.

from nltk.tokenize import sent_tokenize, word_tokenize

sentences = sent_tokenize(article_text)
words = word_tokenize(sentence)

Stop Words Removal:

Remove common English words that do not contribute to the summary.

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
words = [word for word in words if word.lower() not in stop_words]

Lemmatization:

Convert words to their base forms.

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(word.lower()) for word in words]

Generative Model

API Configuration:

Use the google.generativeai library for model generation.

import google.generativeai as genai
import os

api_key = os.environ.get('your_api_key')
genai.configure(api_key=api_key)

Model Initialization:
- Set up the generative AI model.
```
model = genai.GenerativeModel()
```

Model Fine-Tuning

Training

Fine-tune the model with the PubMed dataset to improve summary quality.

# Example pseudo-code for fine-tuning
model.train(dataset=pubmed_data, epochs=10, learning_rate=0.001)

Extractive Summarization

Approach

For extractive summarization, the application uses traditional NLP techniques to identify key sentences from the article without relying on a generative model.

Extractive Summary Script:

Rename the provided extractive_summary.py to app.py and move it to the project root:
```
mv /mnt/data/extractive_summary.py app.py
```

Core Logic:

The extractive summarization script uses statistical and heuristic methods to identify the most important sentences in the text.

# Example of extractive summarization
def extractive_summary(text):
    # Tokenize the text and rank sentences
    sentences = sent_tokenize(text)
    # Rank and select key sentences (pseudo-code)
    summary = ' '.join(sentences[:3])  # Example: Select first 3 sentences
    return summary

Integration:

Integrate the extractive summarization logic with the Flask application to allow users to choose between generative and extractive summaries.

@app.route('/summarize', methods=['POST'])
def summarize():
    if 'file' in request.files and request.files['file'].filename != '':
        file = request.files['file']
        article_text = file.read().decode("utf-8")
    else:
        sample_index = int(request.form['sample'])
        article_text = pubmed_data[sample_index]['article']

    style = request.form.get('style', 'brief')
    summary_method = request.form.get('method', 'generative')
    
    if summary_method == 'generative':
        summary_text = preprocess_and_summarize(article_text, style)
    else:
        summary_text = extractive_summary(article_text)

    return render_template('result.html', original=article_text, summary=summary_text)

Evaluation

Evaluate the model's performance using metrics such as ROUGE or BLEU.

from nltk.translate.bleu_score import sentence_bleu

reference = [reference_summary.split()]
candidate = generated_summary.split()
score = sentence_bleu(reference, candidate)
print(f'BLEU Score: {score}')

Web Application Development

Backend

Flask Setup:

Initialize the Flask app and configure the login manager.

from flask import Flask
from flask_login import LoginManager

app = Flask(__name__)
app.secret_key = 'your_secret_key'
login_manager = LoginManager(app)

Routes and Authentication:

Implement routes for login, registration, summarization, and logout.

@app.route('/login', methods=['GET', 'POST'])
def login():
    # login logic here
    return render_template('login.html')

Frontend

Templates:

Create HTML templates for the user interface.

<!-- templates/index.html -->
<form action="{{ url_for('summarize') }}" method="post" enctype="multipart/form-data">
    <input type="file" name="file">
    <button type="submit">Summarize</button>
</form>

User Experience:
- Ensure a user-friendly interface with clear instructions and feedback.

Installation

Prerequisites

Python 3.7+
Flask
NLTK
Generative AI Library (e.g., google.generativeai)
An API key for generative AI

Steps

Clone the Repository:

git clone https://github.com/yourusername/Article-Summarizer-Using-AI.git

Navigate to the Project Directory:
```
cd Article-Summarizer-Using-AI
```

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies:
```
pip install -r requirements.txt
```
Set Environment Variables:
- Create a .env file with your API key.
```
your_api_key=<YOUR_GENERATIVE_AI_API_KEY>
```
Download NLTK Data:

The script handles downloading necessary NLTK data.

Usage

Run the Application:
```
flask run --port=5001
```
Access the App:
- Visit http://127.0.0.1:5001 in your browser.
Login/Register:
- Register a new account or log in with existing credentials.
Summarize Articles:
- Upload a text file or choose a sample to summarize.
View Summary:
- The summarized text is displayed on the results page.

Thank you for using Article-Summarizer-Using-AI! We hope you find it useful for your summarization needs.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ex_summary		ex_summary
templates		templates
LICENSE		LICENSE
README.md		README.md
app.py		app.py
extractive_summary.py		extractive_summary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Article-Summarizer-Using-AI

Table of Contents

Introduction

Data Exploration

Dataset

Model Selection

Preprocessing

Generative Model

Model Fine-Tuning

Training

Extractive Summarization

Approach

Evaluation

Web Application Development

Backend

Frontend

Installation

Prerequisites

Steps

Usage

About

Releases

Packages

Languages

License

HasnainRzza/Article-Summarizer-Using-AI

Folders and files

Latest commit

History

Repository files navigation

Article-Summarizer-Using-AI

Table of Contents

Introduction

Data Exploration

Dataset

Model Selection

Preprocessing

Generative Model

Model Fine-Tuning

Training

Extractive Summarization

Approach

Evaluation

Web Application Development

Backend

Frontend

Installation

Prerequisites

Steps

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages