Arabic-Text-Summarizer

Analysis

Document Analysis

distribution is right skewed
the distribution of train,val,train are the same that indicate they are caming from the same source
rare data points that have length greater than 2000 tokens length.

Summaries Analysis

Right Skewed.
Most data points have 35-40 tokens in summary.
almost Identical distribution for the 3 splits.

Document Title Analysis

doesn't suffer from skeweness problem
most articles have 8-10 tokens length in title
50% of the data points have less than 10 tokens in the title

English Words Analysis

Most Repeated Words

Results

Decoder-ONLY Models

inceptionai/Jais-family-256m

Metric Score

rouge1 0.024213605715402403

rouge2 0.0014741946283852877

rougeL 0.024084952075629662

rougeLsum 0.02407977715402647

Project Organization

├── README.md <- top-level README for developers using this project.
├── pyproject.toml <- black code formatting configurations.
├── .dockerignore <- Files to be ognored in docker image creation.
├── .gitignore <- Files to be ignored in git check in.
├── .pre-commit-config.yaml <- Things to check before git commit.
├── .circleci/config.yml <- Circleci configurations
├── .pylintrc <- Pylint code linting configurations.
├── Dockerfile <- A file to create docker image.
├── environment.yml <- stores all the dependencies of this project
├── main.py <- A main file to run API server.
├── src <- Source code files to be used by project.
│ ├── inference <- model output generator code
│ ├── model <- model files
│ ├── training <- model training code
│ ├── utility <- contains utility and constant modules.
├── logs <- log file path
├── config <- config file path
├── data <- datasets files
├── docs <- documents from requirement,team collabaroation etc.
├── tests <- unit and performancetest cases files.
│ ├── cov_html <- Unit test cases coverage report

Installation

Development Environment used to create this project:
Operating System: Windows 10 Home

Softwares

Anaconda:4.8.5 Anaconda installation

Python libraries:

Go to location of environment.yml file and run:

conda env create -f environment.yml

Usage

Here we have created ML inference on FastAPI server with dummy model output.

Go inside 'Arabic-Text-Summarizer' folder on command line.
Run:

    conda activate Arabic-Text-Summarizer  
    python main.py

Open 'http://localhost:5000/docs' in a browser.

Unit Testing

Go inside 'tests' folder on command line.
Run:

    pytest -vv 
    pytest --cov-report html:tests/cov_html --cov=src tests/

Performance Testing

Open 2 terminals and start main application in one terminal

    python main.py

In second terminal,Go inside 'tests' folder on command line.
Run:

    locust -f locust_test.py

Black- Code formatter

Go inside 'Arabic-Text-Summarizer' folder on command line.
Run:

    black src

Pylint - Code Linting

Go inside 'Arabic-Text-Summarizer' folder on command line.
Run:

    pylint src

Containerization

Go inside 'Arabic-Text-Summarizer' folder on command line.
Run:

    docker build -t myimage .  
    docker run -d --name mycontainer -p 5000:5000 myimage

Pre-commit hooks

Go inside 'Arabic-Text-Summarizer' folder on command line.
Run:

    pre-commit install

Whenever the command git commit is run, the pre-commit hooks will automatically be applied.
To test before commit,run:

    pre-commit  run

CI/CD using Circleci

Add project on circleci website then monitor build on every commit.

Contributing

Please create a Pull request for any change.

License

NOTE: This software depends on other packages that are licensed under different open source licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.circleci		.circleci
assets		assets
config		config
notebooks		notebooks
src		src
tests		tests
.coverage		.coverage
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
hello.py		hello.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-Text-Summarizer

Analysis

Document Analysis

Summaries Analysis

Document Title Analysis

English Words Analysis

Most Repeated Words

Results

Decoder-ONLY Models

Project Organization

Installation

Softwares

Python libraries:

Usage

Unit Testing

Performance Testing

Black- Code formatter

Pylint - Code Linting

Containerization

Pre-commit hooks

CI/CD using Circleci

Contributing

License

About

Releases

Packages

Languages

Metric	Score
rouge1	0.024213605715402403
rouge2	0.0014741946283852877
rougeL	0.024084952075629662
rougeLsum	0.02407977715402647

License

ahmedelsayed968/Arabic-Text-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Arabic-Text-Summarizer

Analysis

Document Analysis

Summaries Analysis

Document Title Analysis

English Words Analysis

Most Repeated Words

Results

Decoder-ONLY Models

Project Organization

Installation

Softwares

Python libraries:

Usage

Unit Testing

Performance Testing

Black- Code formatter

Pylint - Code Linting

Containerization

Pre-commit hooks

CI/CD using Circleci

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages