- distribution is right skewed
- the distribution of train,val,train are the same that indicate they are caming from the same source
- rare data points that have length greater than 2000 tokens length.
- Right Skewed.
- Most data points have 35-40 tokens in summary.
- almost Identical distribution for the 3 splits.
- doesn't suffer from skeweness problem
- most articles have 8-10 tokens length in title
- 50% of the data points have less than 10 tokens in the title
- inceptionai/Jais-family-256m
Metric Score rouge1 0.024213605715402403 rouge2 0.0014741946283852877 rougeL 0.024084952075629662 rougeLsum 0.02407977715402647
├── README.md <- top-level README for developers using this project.
├── pyproject.toml <- black code formatting configurations.
├── .dockerignore <- Files to be ognored in docker image creation.
├── .gitignore <- Files to be ignored in git check in.
├── .pre-commit-config.yaml <- Things to check before git commit.
├── .circleci/config.yml <- Circleci configurations
├── .pylintrc <- Pylint code linting configurations.
├── Dockerfile <- A file to create docker image.
├── environment.yml <- stores all the dependencies of this project
├── main.py <- A main file to run API server.
├── src <- Source code files to be used by project.
│ ├── inference <- model output generator code
│ ├── model <- model files
│ ├── training <- model training code
│ ├── utility <- contains utility and constant modules.
├── logs <- log file path
├── config <- config file path
├── data <- datasets files
├── docs <- documents from requirement,team collabaroation etc.
├── tests <- unit and performancetest cases files.
│ ├── cov_html <- Unit test cases coverage report
Development Environment used to create this project:
Operating System: Windows 10 Home
Anaconda:4.8.5 Anaconda installation
Go to location of environment.yml file and run:
conda env create -f environment.yml
Here we have created ML inference on FastAPI server with dummy model output.
- Go inside 'Arabic-Text-Summarizer' folder on command line.
- Run:
conda activate Arabic-Text-Summarizer
python main.py
- Open 'http://localhost:5000/docs' in a browser.
- Go inside 'tests' folder on command line.
- Run:
pytest -vv
pytest --cov-report html:tests/cov_html --cov=src tests/
- Open 2 terminals and start main application in one terminal
python main.py
- In second terminal,Go inside 'tests' folder on command line.
- Run:
locust -f locust_test.py
- Go inside 'Arabic-Text-Summarizer' folder on command line.
- Run:
black src
- Go inside 'Arabic-Text-Summarizer' folder on command line.
- Run:
pylint src
- Go inside 'Arabic-Text-Summarizer' folder on command line.
- Run:
docker build -t myimage .
docker run -d --name mycontainer -p 5000:5000 myimage
- Go inside 'Arabic-Text-Summarizer' folder on command line.
- Run:
pre-commit install
- Whenever the command git commit is run, the pre-commit hooks will automatically be applied.
- To test before commit,run:
pre-commit run
- Add project on circleci website then monitor build on every commit.
Please create a Pull request for any change.
NOTE: This software depends on other packages that are licensed under different open source licenses.