This tool simplifies text preprocessing using Natural Language Processing (NLP) techniques. It helps standardize and break down text for easier analysis.
- Text Normalization: Convert text to lowercase and remove punctuation for consistency.
- Remove Stopwords: Filter out common words (e.g., "and", "the", "is") to focus on meaningful content.
- Tokenize into Words: Split text into individual words for detailed analysis.
- Tokenize into Sentences: Divide text into sentences to understand its structure.
- Tokenize into Paragraphs: Separate text into paragraphs for deeper document analysis.
- Text Normalization: Converts text to lowercase and removes punctuation marks.
- Remove Stopwords: Filters out common words to highlight significant content.
- Tokenize into Words
- Input: "Tokenization is an important step."
- Output: ["Tokenization", "is", "an", "important", "step", "."]
- Tokenize into Sentences
- Input: "Tokenization is important. It breaks down text."
- Output: ["Tokenization is important.", "It breaks down text."]
- Tokenize into Paragraphs
- Input: "Tokenization is important. It involves breaking down text into units.\n\nAfter tokenization, further analysis is possible."
- Output: ["Tokenization is important. It involves breaking down text into units.", "After tokenization, further analysis is possible."]
- NLTK: A toolkit for NLP tasks like tokenization and stopwords removal.
pip install nltk
- Download resources:
nltk.download('stopwords')
nltk.download('punkt')
- Flask: A web framework for creating Python web applications.
pip install Flask
Install modules by
pip install -r requirements.txt
To run the application
python app.py