Text Normalization and Tokenization

This tool simplifies text preprocessing using Natural Language Processing (NLP) techniques. It helps standardize and break down text for easier analysis.

Features

Text Normalization: Convert text to lowercase and remove punctuation for consistency.
Remove Stopwords: Filter out common words (e.g., "and", "the", "is") to focus on meaningful content.
Tokenize into Words: Split text into individual words for detailed analysis.
Tokenize into Sentences: Divide text into sentences to understand its structure.
Tokenize into Paragraphs: Separate text into paragraphs for deeper document analysis.

Usage

Text Normalization: Converts text to lowercase and removes punctuation marks.
Remove Stopwords: Filters out common words to highlight significant content.
Tokenize into Words
- Input: "Tokenization is an important step."
- Output: ["Tokenization", "is", "an", "important", "step", "."]
Tokenize into Sentences
- Input: "Tokenization is important. It breaks down text."
- Output: ["Tokenization is important.", "It breaks down text."]
Tokenize into Paragraphs
- Input: "Tokenization is important. It involves breaking down text into units.\n\nAfter tokenization, further analysis is possible."
- Output: ["Tokenization is important. It involves breaking down text into units.", "After tokenization, further analysis is possible."]

Required Modules

NLTK: A toolkit for NLP tasks like tokenization and stopwords removal.
pip install nltk
Download resources:
nltk.download('stopwords') nltk.download('punkt')
Flask: A web framework for creating Python web applications.
pip install Flask

Python version 3.10 - 3.11

Install modules by
pip install -r requirements.txt

To run the application
python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
static		static
templates		templates
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Normalization and Tokenization

Features

Usage

Required Modules

Python version 3.10 - 3.11

Web Page

Text Normalization

Remove Stopwords

Tokenize into Words

Tokenize into Sentences

Tokenize into Paragraphs

About

Releases

Packages

Languages

hariharasudan3/Text-Normalization-NLP

Folders and files

Latest commit

History

Repository files navigation

Text Normalization and Tokenization

Features

Usage

Required Modules

Python version 3.10 - 3.11

Web Page

Text Normalization

Remove Stopwords

Tokenize into Words

Tokenize into Sentences

Tokenize into Paragraphs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages