Skip to content

Commit

Permalink
Rename project and cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
chrislemke committed Dec 13, 2022
1 parent 6de4af3 commit 7fe3356
Show file tree
Hide file tree
Showing 39 changed files with 879 additions and 1,200 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.9
- run: pip install poetry==1.2.1
- run: poetry install
python-version: "3.10"
- run: pip install poetry==1.3.1
- run: poetry install --without dev
- run: poetry run mkdocs gh-deploy --force --clean --verbose
6 changes: 3 additions & 3 deletions .github/workflows/code-cov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ jobs:
python-version: "3.10"
- name: Install dependencies and project
run: |
python -m pip install poetry==1.2.1
poetry install
python -m pip install poetry==1.3.1
poetry install --without devs --without docs
- name: Run tests and collect coverage
run: poetry run pytest --cov feature_reviser --cov-report term-missing --cov-report xml
run: poetry run pytest --cov sk-transformers --cov-report term-missing --cov-report xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
6 changes: 3 additions & 3 deletions .github/workflows/deploy-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.9
- run: pip install poetry==1.2.1
- run: poetry config pypi-token.pypi ${{ secrets.FEATURE_REVISER_UPLOAD_TOKEN }}
python-version: "3.10"
- run: pip install poetry==1.3.1
- run: poetry config pypi-token.pypi ${{ secrets.UPLOAD_TOKEN }}
- run: poetry publish --build
12 changes: 6 additions & 6 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,23 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install dependencies and project
run: |
python -m pip install poetry==1.2.1
python -m pip install poetry==1.3.1
poetry install
- name: Check with isort
run: |
poetry run isort --check-only ./feature_reviser ./tests
poetry run isort --check-only ./src ./tests
- name: Check with black
run: |
poetry run black --check ./feature_reviser ./tests
poetry run black --check ./src ./tests
- name: Check with mypy
run: |
poetry run mypy --config-file=pyproject.toml .
- name: Check with bandit
run: |
poetry run bandit -r ./feature_reviser/*
poetry run bandit -r ./src/*
- name: Lint with pylint
run: |
poetry run pylint --rcfile=pyproject.toml ./feature_reviser ./tests
poetry run pylint --rcfile=pyproject.toml ./src ./tests
- name: Test with pytest
run: |
poetry run pytest --cov feature_reviser --cov-fail-under=90 --cov-report term-missing
poetry run pytest --cov src --cov-fail-under=90 --cov-report term-missing
22 changes: 14 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,27 @@ repos:
- id: check-toml
- id: debug-statements
- id: fix-byte-order-marker
- id: fix-encoding-pragma
- id: forbid-new-submodules

- repo: https://github.com/psf/black
rev: 22.10.0
rev: 22.12.0
hooks:
- id: black

- repo: https://github.com/PyCQA/pylint
rev: v2.15.5
rev: v2.15.8
hooks:
- id: pylint
args: ["--rcfile=pyproject.toml"]

- repo: https://github.com/PyCQA/isort
rev: 5.10.1
rev: 5.11.1
hooks:
- id: isort
args: ["--profile=black"]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.982
rev: v0.991
hooks:
- id: mypy
args:
Expand All @@ -52,14 +51,21 @@ repos:
- "-r"

- repo: https://github.com/asottile/pyupgrade
rev: v2.38.0
rev: v3.3.1
hooks:
- id: pyupgrade

- repo: https://github.com/python-poetry/poetry
rev: 1.2.2
rev: 1.3.1
hooks:
- id: poetry-check
- id: poetry-lock
- id: poetry-export
args: ["--dev", "-f", "requirements.txt", "-o", "requirements.txt"]
args:
[
"--dev",
"--format",
"requirements.txt",
"--output",
"requirements.txt",
]
2 changes: 2 additions & 0 deletions Brewfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
brew "poetry"
brew "pre-commit"
2 changes: 1 addition & 1 deletion docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ print(pipeline.fit_transform(df).head())
1 Incredible Hulk Schikaneder
2 Tom and Jerry Futuregarden
```
For a non-dummy examples check out the [`MathExpressionTransformer`](number_transformer-reference.md#feature_reviser.transformer.number_transformer.MathExpressionTransformer) or the [`ValueIndicatorTransformer`](generic_transformer-reference.md#feature_reviser.transformer.generic_transformer.ValueIndicatorTransformer) for a simpler example.
For a non-dummy examples check out the [`MathExpressionTransformer`](number_transformer-reference.md#sk-transformers.transformer.number_transformer.MathExpressionTransformer) or the [`ValueIndicatorTransformer`](generic_transformer-reference.md#sk-transformers.transformer.generic_transformer.ValueIndicatorTransformer) for a simpler example.

## Poetry
We are using [Poetry](https://python-poetry.org/) to manage the dependencies and the virtual environment. If you have not used it before please check out the [documentation](https://python-poetry.org/docs/) to get started.
Expand Down
41 changes: 19 additions & 22 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
![The machine](https://raw.githubusercontent.com/chrislemke/feature-reviser/master/docs/assets/images/image.png)
![The machine](https://raw.githubusercontent.com/chrislemke/sk-transformers/master/docs/assets/images/image.png)

# feature-reviser
# sk-transformers
### A collection of various scikit-learn transformers for all kinds of preprocessing and feature engineering steps 🛠

[![testing](https://github.com/chrislemke/feature-reviser/actions/workflows/testing.yml/badge.svg?branch=main)](https://github.com/chrislemke/feature-reviser/actions/workflows/testing.yml)
[![codecov](https://codecov.io/github/chrislemke/feature-reviser/branch/main/graph/badge.svg?token=LJLXQXX6M8)](https://codecov.io/github/chrislemke/feature-reviser)
[![deploy package](https://github.com/chrislemke/feature-reviser/actions/workflows/deploy-package.yml/badge.svg)](https://github.com/chrislemke/feature-reviser/actions/workflows/deploy-package.yml)
[![pypi](https://img.shields.io/pypi/v/feature-reviser)](https://pypi.org/project/feature-reviser/)
![python version](https://img.shields.io/pypi/pyversions/feature-reviser?logo=python&logoColor=yellow)
[![downloads](https://img.shields.io/pypi/dm/feature-reviser)](https://pypistats.org/packages/feature-reviser)
[![docs](https://img.shields.io/badge/docs-mkdoks%20material-blue)](https://chrislemke.github.io/feature-reviser/)
[![license](https://img.shields.io/github/license/chrislemke/feature-reviser)](https://github.com/chrislemke/feature-reviser/blob/main/LICENSE)
[![testing](https://github.com/chrislemke/sk-transformers/actions/workflows/testing.yml/badge.svg?branch=main)](https://github.com/chrislemke/sk-transformers/actions/workflows/testing.yml)
[![codecov](https://codecov.io/github/chrislemke/sk-transformers/branch/main/graph/badge.svg?token=LJLXQXX6M8)](https://codecov.io/github/chrislemke/sk-transformers)
[![deploy package](https://github.com/chrislemke/sk-transformers/actions/workflows/deploy-package.yml/badge.svg)](https://github.com/chrislemke/sk-transformers/actions/workflows/deploy-package.yml)
[![pypi](https://img.shields.io/pypi/v/sk-transformers)](https://pypi.org/project/sk-transformers/)
![python version](https://img.shields.io/pypi/pyversions/sk-transformers?logo=python&logoColor=yellow)
[![downloads](https://img.shields.io/pypi/dm/sk-transformers)](https://pypistats.org/packages/sk-transformers)
[![docs](https://img.shields.io/badge/docs-mkdoks%20material-blue)](https://chrislemke.github.io/sk-transformers/)
[![license](https://img.shields.io/github/license/chrislemke/sk-transformers)](https://github.com/chrislemke/sk-transformers/blob/main/LICENSE)
[![mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
## Introduction
Every data tabular is different. Every column needs to be treated differently. [Scikit-learn](https://scikit-learn.org/stable/index.html) has a nice [collection of dataset transformers](https://scikit-learn.org/stable/data_transforms.html). But the possibilities of data transformation are infinite - one collection is simply not enough. This project provides a brought collection of data transformers. The idea is simple. It is like a well-equipped toolbox 🧰: You always find the tool you need and sometimes you get inspired by seeing a tool you did not know before. Please feel free to [contribute](https://chrislemke.github.io/feature-reviser/CONTRIBUTING/) your tools and ideas.
Every data tabular is different. Every column needs to be treated differently. [Scikit-learn](https://scikit-learn.org/stable/index.html) has a nice [collection of dataset transformers](https://scikit-learn.org/stable/data_transforms.html). But the possibilities of data transformation are infinite - one collection is simply not enough. This project provides a brought collection of data transformers. The idea is simple. It is like a well-equipped toolbox 🧰: You always find the tool you need and sometimes you get inspired by seeing a tool you did not know before. Please feel free to [contribute](https://chrislemke.github.io/sk-transformers/CONTRIBUTING/) your tools and ideas.

## Installation
If you are using [Poetry](https://python-poetry.org/), you can install the package with the following command:
```bash
poetry add feature-reviser
poetry add sk_transformers
```
If you are using [pip](https://pypi.org/project/pip/), you can install the package with the following command:
```bash
pip install feature-reviser
pip install sk_transformers
```

## installing dependencies
Expand All @@ -39,13 +39,13 @@ pip install -r requirements.txt
```

## The transformers
Data preprocessing often involves similar processes. No matter whether it's manipulating strings or numbers, etc. [Scikit-learn's pipeline](https://scikit-learn.org/stable/modules/compose.html#combining-estimators) implementation makes it easy to structure and sequence such preprocessing processes. To take advantage of this, the [`transformer`](https://github.com/chrislemke/feature-reviser/tree/main/feature_reviser/transformer) part of the project contains multiple methods that can be easily pipelined to simplify preprocessing. The list of transformers is open and will be extended permanently. Feel free to [contribute](https://chrislemke.github.io/feature-reviser/CONTRIBUTING/)! 🛠
Data preprocessing often involves similar processes. No matter whether it's manipulating strings or numbers, etc. [Scikit-learn's pipeline](https://scikit-learn.org/stable/modules/compose.html#combining-estimators) implementation makes it easy to structure and sequence such preprocessing processes. To take advantage of this, the [`transformers`](https://github.com/chrislemke/sk-transformers/tree/main/sk-transformers/transformer) contain multiple methods that can be easily pipelined to simplify preprocessing. The list of transformers is open and will be extended permanently. Feel free to [contribute](https://chrislemke.github.io/sk-transformers/CONTRIBUTING/)! 🛠

### Usage
Let's assume you want to use some method from [NumPy's mathematical functions](https://numpy.org/doc/stable/reference/routines.math.html), to sum up the values of column `foo` and column `bar`. You could
use the [`MathExpressionTransformer`](https://chrislemke.github.io/feature-reviser/number_transformer-reference/#feature_reviser.transformer.number_transformer.MathExpressionTransformer):
use the [`MathExpressionTransformer`](https://chrislemke.github.io/sk-transformers/number_transformer-reference/#sk-transformers.transformer.number_transformer.MathExpressionTransformer):
```python
from feature_reviser import MathExpressionTransformer
from sk_transformers import MathExpressionTransformer
import pandas as pd
X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = MathExpressionTransformer([("foo", "np.sum", "bar", {"axis": 0})])
Expand All @@ -56,17 +56,14 @@ array([[1, 4, 5],
[2, 5, 7],
[3, 6, 9]])
```
Even if we only pass one tuple to the transformer - in this example. Like with most other transformers the idea is to simplify preprocessing by giving the possibility to operate on multiple columns at the same time. In this case, the [`MathExpressionTransformer`](https://chrislemke.github.io/feature-reviser/number_transformer-reference/#feature_reviser.transformer.number_transformer.MathExpressionTransformer) has created an extra column with the name `foo_sum_bar`.

## The feature reviser (under construction 🚧)
Finding the best features for your model is hard. In the `feature_selection` part of the project, we try to automate this process to make it a bit easier. This part of the project is still in development and is not yet ready for use. If you want to help, you can find more information in the [contributing guide](https://chrislemke.github.io/feature-reviser/CONTRIBUTING/).
Even if we only pass one tuple to the transformer - in this example. Like with most other transformers the idea is to simplify preprocessing by giving the possibility to operate on multiple columns at the same time. In this case, the [`MathExpressionTransformer`](https://chrislemke.github.io/sk-transformers/number_transformer-reference/#sk-transformers.transformer.number_transformer.MathExpressionTransformer) has created an extra column with the name `foo_sum_bar`.

## Contributing
We're all kind of in the same boat. Preprocessing/feature engineering in data science is somehow very individual - every feature is different and must be handled and processed differently. But somehow we all have the same problems: sometimes date columns have to be changed. Sometimes strings have to be formatted, sometimes durations have to be calculated, etc. There is a huge number of preprocessing possibilities but we all use the same tools.

[Scikit-learns pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) help to use formalized functions. So why not also share these so-called transformers with others? This open source project has the goal to collect useful preprocessing pipeline steps. Let us all collect what we used for preprocessing and share it with others. This way we can all benefit from each other's work and save a lot of time. So if you have a preprocessing step that you use regularly, please feel free to contribute it to this project. The idea is that this is not only a toolbox but also an inspiration for what is possible. Maybe you have not thought about this preprocessing step before.

Please check out the [guide](https://chrislemke.github.io/feature-reviser/CONTRIBUTING/) on how to contribute to this project.
Please check out the [guide](https://chrislemke.github.io/sk-transformers/CONTRIBUTING/) on how to contribute to this project.

## Further information
For further information, please refer to the [documentation](https://chrislemke.github.io/feature-reviser/).
For further information, please refer to the [documentation](https://chrislemke.github.io/sk-transformers/).
2 changes: 1 addition & 1 deletion docs/base_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.base_transformer
:::src.transformer.base_transformer
2 changes: 1 addition & 1 deletion docs/datetime_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.datetime_transformer
:::src.transformer.datetime_transformer
2 changes: 1 addition & 1 deletion docs/encoder_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.encoder_transformer
:::src.transformer.encoder_transformer
2 changes: 1 addition & 1 deletion docs/generic_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.generic_transformer
:::src.transformer.generic_transformer
2 changes: 1 addition & 1 deletion docs/number_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.number_transformer
:::src.transformer.number_transformer
3 changes: 0 additions & 3 deletions docs/reviser-reference.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/selector-reference.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/string_transformer-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.transformer.string_transformer
:::src.transformer.string_transformer
2 changes: 1 addition & 1 deletion docs/utils-reference.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
:::feature_reviser.utils
:::src.utils
82 changes: 0 additions & 82 deletions feature_reviser/feature_selection/reviser.py

This file was deleted.

Loading

0 comments on commit 7fe3356

Please sign in to comment.