Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pip package #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Deploy

on:
push:
tags:
- '*'

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip setuptools wheel twine

- name: Build and publish to PyPI
if: startsWith(github.ref, 'refs/tags/')
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
python -m twine upload --repository pypi dist/*
52 changes: 51 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,58 @@

* We present a list of auto-translated stopwords from English and adapt them to native [South African Bantu Languages](https://pubs.cs.uct.ac.za/id/eprint/1334/1/icadl_2019_banturecognition.pdf)


## Installation

You can install the package via pip:

```bash
pip install -U our-stopwords
```

## Quick Start

* The are two ways to use the installed version of `our-stopwords`


### 1. Using as a Python Library

```python
import our_stopwords

# List all available languages
available_languages = our_stopwords.list_available_languages()
print("Available languages:", available_languages)

# Get the list of stopwords for a specific language
stopwords = our_stopwords.get_stopwords('ven')
print(stopwords)
# Output: [{'eng': 'a', 'ven': 'a'}, {'eng': 'about', 'ven': 'nga'}, {'eng': 'after', 'ven': 'mulweli'}, ...]
```

### 2. Usage from the CLI

### List Available Languages

You can list all available languages supported by the package:

```bash
our_stopwords list
```

### Get Stop Words for a Language

To get stop words for a specific language, use the following command (replace `ven` with the language code of your choice):

```bash
our_stopwords get ven
```





## Usage
## Manual Usage

- The data is provided in [JSON Lines](https://jsonlines.org/) format. Here is an example of using the stopwords in Python:

Expand Down
Empty file added our_stopwords/__init__.py
Empty file.
85 changes: 85 additions & 0 deletions our_stopwords/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# our_stopwords/_main__.py

import os
import json


DATA_DIR = os.path.join(os.path.dirname(__file__), 'data')


def list_available_languages():
"""
List all available language codes.

Returns:
- list: List of available language codes.
"""
files = os.listdir(DATA_DIR)
language_codes = [os.path.splitext(file)[0] for file in files if file.endswith('.jsonl')]
return language_codes


def get_stopwords(language_code: str):
"""
Retrieve stop words for a specific language.

Parameters:
- language_code (str): Language code (e.g., 'ven' for Venda).

Returns:
- list: List of stop words for the specified language.
"""
# Ensure language code is lowercase
language_code = language_code.lower()

# Check if the language code is valid
valid_codes = list_available_languages()
if language_code not in valid_codes:
raise ValueError(f"Unsupported language code '{language_code}'. Please use one of {valid_codes}.")

# Load stop words from the JSON lines file
file_path = os.path.join(DATA_DIR, f'{language_code}.jsonl')
with open(file_path, 'r', encoding='utf-8') as file:
stop_words = []
for line in file:
stop_words.append(json.loads(line.strip()))

return stop_words


def cli():
try:
if args.command == 'list':
available_languages = list_languages()
print("Available languages:")
for lang in available_languages:
print(f" - {lang}")

elif args.command == 'get':
stopwords = get_stopwords(args.language_code)
for s in stopwords:
print(s)
else:
parser.print_help()
except Exception as e:
print(f"Error: {str(e)}")


if __name__ == "__main__":
import argparse

parser = argparse.ArgumentParser(description="CLI for accessing multilingual stop words for South African Bantu languages.")
subparsers = parser.add_subparsers(dest='command', title='Commands', description='Valid commands')
list_parser = subparsers.add_parser('list', help='List all available languages')
get_parser = subparsers.add_parser('get', help='Get stop words for a specific language')
get_parser.add_argument('language_code', type=str, help='Language code (e.g., "ven" for Venda)')

args = parser.parse_args()
cli()


"""Usage:

our_stopwords list
our_stopwords get ven
"""
51 changes: 51 additions & 0 deletions our_stopwords/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# our_stopwords/cli.py

import argparse
import json
import our_stopwords

def list_languages():
"""
CLI command to list all available languages.
"""
available_languages = our_stopwords.list_available_languages()
print("Available languages:")
for lang in available_languages:
print(f" - {lang}")

def get_stopwords(language_code):
"""
CLI command to get stop words for a specific language.

Parameters:
- language_code (str): Language code (e.g., 'ven' for Venda).
"""
try:
stopwords = our_stopwords.get_stopwords(language_code)
print(json.dumps(stopwords, indent=2, ensure_ascii=False))
except ValueError as e:
print(f"Error: {str(e)}")

def main():
parser = argparse.ArgumentParser(description="CLI for accessing multilingual stop words for African languages.")

subparsers = parser.add_subparsers(dest='command', title='Commands', description='Valid commands')

# Subcommand: list
list_parser = subparsers.add_parser('list', help='List all available languages')

# Subcommand: get
get_parser = subparsers.add_parser('get', help='Get stop words for a specific language')
get_parser.add_argument('language_code', type=str, help='Language code (e.g., "ven" for Venda)')

args = parser.parse_args()

if args.command == 'list':
list_languages()
elif args.command == 'get':
get_stopwords(args.language_code)
else:
parser.print_help()

if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions our_stopwords/data/nso.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"eng": "with", "ven": "le"}
1 change: 1 addition & 0 deletions our_stopwords/data/ven.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"eng": "with", "ven": "na"}
31 changes: 31 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from setuptools import setup, find_packages

setup(
name='our_stopwords',
version='1.0',
packages=find_packages(),
include_package_data=True,
install_requires=[
'pandas',
'scikit-learn'
],
package_data={
'our_stopwords': ['data/*.jsonl'],
},
author='Ndamulelo Nemakhavhani',
author_email='endeesa@yahoo.com',
description='A package for accessing multilingual stop words for South African Bantu Languages.',
long_description=open('README.md').read(),
long_description_content_type='text/markdown',
url='https://github.com/ndamulelonemakh/our-stopwords',
classifiers=[
'Programming Language :: Python :: 3',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
],
entry_points={
'console_scripts': [
'our_stopwords = our_stopwords.__main__:cli'
]
},
)