[INF] Infra upgrades (#1294)

* chore: update pre-commit hooks - Removed unused hooks: isort, flake8 - Added new hook: ruff - Updated black and interrogate hooks configuration - Commented out darglint due to timeout issues on pre-commit CI This update aims to improve code quality checks and ensure consistency across the codebase. * feat: Add ruff tool configuration to pyproject.toml This commit introduces a new feature to the codebase by adding the configuration for the ruff tool in the pyproject.toml file. The configuration includes enabling pycodestyle and Pyflakes codes, allowing fixes for all enabled rules, excluding commonly ignored directories, setting the line length to 88, allowing unused variables when underscore-prefixed, and assuming Python 3.8. It also sets the default complexity level to 10 for the mccabe tool under ruff. * chore(pyproject.toml): Update target Python version to 3.10 This commit updates the target Python version in the pyproject.toml file from 3.8 to 3.10. * chore(environment-dev): update Python and rdkit versions - Python version updated from 3.9 to 3.10 - rdkit version constraint removed * chore: Comment out darglint and add --fix arg to ruff In this commit, the darglint pre-commit hook has been commented out. Additionally, the --fix argument has been added to the ruff pre-commit hook. * infra: satisfy the shiny new ruff linter * chore: disable darglint in pre-commit config Due to performance issues, darglint has been commented out in the pre-commit configuration. It may be replaced by ruff in the future. See astral-sh/ruff#458 for more details. * feat: update pre-commit hooks and add pydoclint - Updated the version of pre-commit-hooks from v4.4.0 to v4.5.0. - Added pydoclint as an interim replacement for darglint with configuration in pyproject.toml. * refactor(janitor/utils): use isinstance for type checking Changed the type checking in the skipna function from using type() to isinstance() for better Pythonic practice. * refactor: update docstrings and remove redundant comments In this commit, we have updated the docstring for the `_get_data_df` method in the `DataDescription` class to provide more detailed information about its functionality. We have also removed the redundant comments from the `__init__` method of the `col` class in `utils.py` as they were not providing any additional value. * chore: remove darglint checks workflow This commit removes the darglint checks workflow from the GitHub actions. The workflow was initially added to run darglint checks manually due to the pre-commit CI timing out. Now that the issue has been resolved, the workflow is no longer needed. * feat(janitor): add 'col' utility to functions This commit introduces the 'col' utility from the utils module into the janitor package. This utility can now be accessed directly from the janitor package. * refactor(janitor): update import statements and function usage - Updated import statement in __init__.py to include DropLabel from functions.utils - Modified usage of expand_grid function in expand_grid.py to be directly called instead of through the janitor module * test: remove redundant dataframe method registration tests This commit removes the test_df_registration.py file, which contained redundant tests for dataframe method registration. These tests were not necessary as the registration of these methods is guaranteed by the pandas-flavor library. * feat(utils): add dynamic_import function and import janitor.chemistry in test - Added a new function `dynamic_import` in `janitor/utils.py` that allows for dynamic importing of all modules in a directory. - Imported `janitor.chemistry` in `tests/chemistry/test_maccs_keys_fingerprint.py` to ensure it's available during testing. - Also added `importlib` and `pathlib.Path` to `janitor/utils.py` to support the new function. * feat(janitor/functions): add dynamic import functionality - Imported dynamic_import from janitor.utils - Called dynamic_import function with __name__ as argument * refactor: update dynamic_import argument and limit test examples - In `janitor/functions/__init__.py`, the argument passed to `dynamic_import` has been updated from `__name__` to `Path(__name__)` to leverage the pathlib library for more robust path handling. - In `tests/functions/test_conditional_join.py`, the number of examples for several tests has been limited to improve test performance and reduce runtime. * refactor(janitor/functions): remove unused imports and dynamic import function This commit removes the unused imports 'Path' from 'pathlib' and 'dynamic_import' from 'janitor.utils'. It also removes the call to 'dynamic_import' function which is no longer needed. * refactor(tests): import janitor module in test files - Modified the import statements in test_expand_grid.py and test_factorize_columns.py to include the janitor module. - This change ensures that the janitor module is explicitly imported in the test files. * test: import janitor in test_fill_direction.py This commit adds an import statement for the janitor module in the test_fill_direction.py file. This is necessary for the proper functioning of the tests in this file. * refactor(janitor): reorganize function imports and remove unused imports This commit reorganizes the function imports in the janitor package to improve code readability and maintainability. It also removes an unused import from the main __init__.py file. * test: limit max examples in pytest settings to 10 This commit reduces the maximum number of examples generated by pytest for each test case from unlimited to 10. This change is intended to speed up test execution time without significantly reducing test coverage. * test: limit max examples in pytest settings to 10 In an effort to optimize testing time, the maximum number of examples for each test in the pytest settings has been reduced to 10. This change affects multiple test functions in the 'test_conditional_join.py' file. * test: limit max examples in pytest settings to 10 for multiple test functions * test: limit max examples in pytest to improve test performance * feat(devguide): expand section on writing code This commit expands the "Write the Code" section in the developer guide. It provides more detailed instructions on best practices for writing code, including committing early and often, staying updated with the dev branch, and writing tests. It also updates the "Check your code" section to include information about pre-commit hooks. * chore(github-actions): update checkout action and remove test matrix This commit updates the version of the checkout action used in the GitHub Actions workflow from v3 to v4. It also removes the matrix strategy for running tests, which previously included "turtle" and "not turtle" subsets. Now, all tests will be run without any subset specification. * test: Add execution test for conditional_join function This commit introduces a new test for the conditional_join function in the test_conditional_join.py file. The test uses an example directly from the conditional_join docstring to verify the function's correct operation.
pyjanitor-devs · Oct 14, 2023 · a4f1c0a · a4f1c0a
1 parent 4ea22dc
commit a4f1c0a
Show file tree

Hide file tree

Showing 101 changed files with 615 additions and 619 deletions.
diff --git a/.github/workflows/darglint-checks.yml b/.github/workflows/darglint-checks.yml
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -27,8 +27,6 @@ jobs:
   run-tests:
     strategy:
       fail-fast: false
-      matrix:
-        test-subset: ["turtle", "not turtle"]
     runs-on: ubuntu-latest
     name: Run pyjanitor test suite
 
@@ -39,7 +37,7 @@ jobs:
 
     steps:
       - name: Checkout repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
 
       # See: https://github.com/marketplace/actions/setup-miniconda
       - name: Setup miniconda
@@ -58,7 +56,7 @@ jobs:
         run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml --doctest-only janitor
 
       - name: Run unit tests
-        run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml tests -m "${{ matrix.test-subset }}"
+        run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml tests
 
       # https://github.com/codecov/codecov-action
       - name: Upload code coverage

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -8,38 +8,34 @@ repos:
       - id: end-of-file-fixer
       - id: check-yaml
       - id: check-added-large-files
-
   - repo: https://github.com/psf/black
     rev: 23.9.1
     hooks:
       - id: black
         args: [--config, pyproject.toml]
-
-  # - repo: https://github.com/pycqa/isort
-  #   rev: 5.11.2
-  #   hooks:
-  #     - id: isort
-  #       name: isort (python)
-
   - repo: https://github.com/econchick/interrogate
     rev: 1.5.0
     hooks:
       - id: interrogate
         args: [-c, pyproject.toml]
+  # Taking out darglint because it takes too long to run.
+  # It may be superseded by ruff: https://github.com/astral-sh/ruff/issues/458
+  # - repo: https://github.com/terrencepreilly/darglint
+  #   rev: v1.8.1
+  #   hooks:
+  #     - id: darglint
+  #       args: [-v 2] # this config makes the error messages a bit less cryptic.
 
-  - repo: https://github.com/terrencepreilly/darglint
-    rev: v1.8.1
+  # The interim replacement for darglint is pydoclint.
+  - repo: https://github.com/jsh9/pydoclint
+    rev: 0.3.3
     hooks:
-      - id: darglint
-        args: [-v 2] # this config makes the error messages a bit less cryptic.
-
-  - repo: https://github.com/PyCQA/flake8
-    rev: 6.1.0
+      - id: pydoclint
+        args:
+          - "--config=pyproject.toml"
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.0.292
     hooks:
-      - id: flake8
-        args: [--exclude, nbconvert_config.py]
-
-ci:
-  skip:
-    # FIXME: darglint is timing out on pre-commit CI (cf. #1236, #1246)
-    - darglint
+      - id: ruff
+        args: [--fix]
diff --git a/environment-dev.yml b/environment-dev.yml
@@ -2,7 +2,7 @@ name: pyjanitor-dev
 channels:
   - conda-forge
 dependencies:
-  - python=3.9
+  - python=3.10
   - biopython
   - black=22.12.0 # keep this in sync with `.pre-commit-config.yaml`
   - bump2version=1.0.1
@@ -40,7 +40,7 @@ dependencies:
   - pytest-xdist
   - pytest-doctestplus
   - python-language-server
-  - rdkit=2021.09.3
+  - rdkit
   - recommonmark
   - seaborn
   - twine

diff --git a/janitor/accessors/data_description.py b/janitor/accessors/data_description.py
@@ -13,11 +13,14 @@ class DataDescription:
     """
 
     def __init__(self, data):
-        """Initialize DataDescription class."""
         self._data = data
         self._desc = {}
 
     def _get_data_df(self) -> pd.DataFrame:
+        """Get a table of descriptive information in a DataFrame format.
+
+        :returns: A DataFrame containing the descriptive information.
+        """
         df = self._data
 
         data_dict = {}

diff --git a/janitor/engineering.py b/janitor/engineering.py
@@ -7,7 +7,6 @@
 
 from .utils import check, import_message
 
-
 try:
     import unyt
 except ImportError:

diff --git a/janitor/finance.py b/janitor/finance.py
@@ -9,8 +9,8 @@
 import requests
 
 from janitor.errors import JanitorError
-from .utils import check, deprecated_alias, is_connected
 
+from .utils import check, deprecated_alias, is_connected
 
 currency_set = {
     "AUD",

diff --git a/janitor/functions/__init__.py b/janitor/functions/__init__.py
@@ -43,7 +43,7 @@
 from .expand_grid import expand_grid
 from .factorize_columns import factorize_columns
 from .fill import fill_direction, fill_empty
-from .filter import filter_date, filter_column_isin, filter_on, filter_string
+from .filter import filter_column_isin, filter_date, filter_on, filter_string
 from .find_replace import find_replace
 from .flag_nulls import flag_nulls
 from .get_dupes import get_dupes
@@ -64,7 +64,7 @@
 from .reorder_columns import reorder_columns
 from .round_to_fraction import round_to_fraction
 from .row_to_names import row_to_names
-from .select import select_columns, select_rows, select
+from .select import select, select_columns, select_rows
 from .shuffle import shuffle
 from .sort_column_value_order import sort_column_value_order
 from .sort_naturally import sort_naturally
@@ -76,10 +76,85 @@
 from .truncate_datetime import truncate_datetime_dataframe
 from .update_where import update_where
 from .utils import (
-    patterns,
-    unionize_dataframe_categories,
     DropLabel,
-    get_index_labels,
     col,
     get_columns,
+    get_index_labels,
+    patterns,
+    unionize_dataframe_categories,
 )
+
+__all__ = [
+    "add_columns",
+    "also",
+    "bin_numeric",
+    "case_when",
+    "change_type",
+    "clean_names",
+    "coalesce",
+    "collapse_levels",
+    "complete",
+    "concatenate_columns",
+    "conditional_join",
+    "convert_excel_date",
+    "convert_matlab_date",
+    "convert_unix_date",
+    "count_cumulative_unique",
+    "currency_column_to_numeric",
+    "deconcatenate_column",
+    "drop_constant_columns",
+    "drop_duplicate_columns",
+    "dropnotnull",
+    "encode_categorical",
+    "expand_column",
+    "expand_grid",
+    "factorize_columns",
+    "fill_direction",
+    "fill_empty",
+    "filter_date",
+    "filter_column_isin",
+    "filter_on",
+    "filter_string",
+    "find_replace",
+    "flag_nulls",
+    "get_dupes",
+    "groupby_agg",
+    "groupby_topk",
+    "impute",
+    "jitter",
+    "join_apply",
+    "label_encode",
+    "limit_column_characters",
+    "min_max_scale",
+    "move",
+    "pivot_longer",
+    "pivot_wider",
+    "process_text",
+    "remove_columns",
+    "remove_empty",
+    "rename_column",
+    "rename_columns",
+    "reorder_columns",
+    "round_to_fraction",
+    "row_to_names",
+    "select_columns",
+    "select_rows",
+    "select",
+    "shuffle",
+    "sort_column_value_order",
+    "sort_naturally",
+    "take_first",
+    "then",
+    "to_datetime",
+    "toset",
+    "transform_column",
+    "transform_columns",
+    "truncate_datetime_dataframe",
+    "update_where",
+    "patterns",
+    "unionize_dataframe_categories",
+    "DropLabel",
+    "get_index_labels",
+    "col",
+    "get_columns",
+]
diff --git a/janitor/functions/_numba.py b/janitor/functions/_numba.py
@@ -2,14 +2,15 @@
 
 import numpy as np
 import pandas as pd
+from numba import njit, prange
+from pandas.api.types import is_datetime64_dtype, is_extension_array_dtype
+
 from janitor.functions.utils import (
     _generic_func_cond_join,
     _JoinOperator,
-    less_than_join_types,
     greater_than_join_types,
+    less_than_join_types,
 )
-from numba import njit, prange
-from pandas.api.types import is_extension_array_dtype, is_datetime64_dtype
 
 
 def _numba_equi_join(df, right, eqs, ge_gt, le_lt):

diff --git a/janitor/functions/add_columns.py b/janitor/functions/add_columns.py
@@ -1,9 +1,10 @@
+from typing import Any, List, Tuple, Union
+
+import numpy as np
+import pandas as pd
 import pandas_flavor as pf
 
 from janitor.utils import check, deprecated_alias, refactored_function
-import pandas as pd
-from typing import Union, List, Any, Tuple
-import numpy as np
 
 
 @pf.register_dataframe_method

diff --git a/janitor/functions/also.py b/janitor/functions/also.py
@@ -1,7 +1,8 @@
 """Implementation source for chainable function `also`."""
 from typing import Any, Callable
-import pandas_flavor as pf
+
 import pandas as pd
+import pandas_flavor as pf
 
 
 @pf.register_dataframe_method

diff --git a/janitor/functions/bin_numeric.py b/janitor/functions/bin_numeric.py
@@ -1,11 +1,11 @@
 """Implementation source for `bin_numeric`."""
-from typing import Any, Optional, Union, Sequence
-import pandas_flavor as pf
+from typing import Any, Optional, Sequence, Union
+
 import pandas as pd
+import pandas_flavor as pf
 
 from janitor.utils import check, check_column, deprecated_alias
 
-
 ScalarSequence = Sequence[float]
 
 

diff --git a/janitor/functions/case_when.py b/janitor/functions/case_when.py
@@ -1,10 +1,12 @@
 """Implementation source for `case_when`."""
-from pandas.core.common import apply_if_callable
+import warnings
 from typing import Any
-import pandas_flavor as pf
+
 import pandas as pd
+import pandas_flavor as pf
 from pandas.api.types import is_scalar
-import warnings
+from pandas.core.common import apply_if_callable
+
 from janitor.utils import check, find_stack_level
 
 warnings.simplefilter("always", DeprecationWarning)

diff --git a/janitor/functions/clean_names.py b/janitor/functions/clean_names.py
@@ -1,13 +1,14 @@
 """Functions for cleaning columns names."""
-from janitor.utils import deprecated_alias
-from janitor.functions.utils import get_index_labels, _is_str_or_cat
-from pandas.api.types import is_scalar
+import unicodedata
 from typing import Hashable, Optional, Union
+
 import pandas as pd
 import pandas_flavor as pf
+from pandas.api.types import is_scalar
 
 from janitor.errors import JanitorError
-import unicodedata
+from janitor.functions.utils import _is_str_or_cat, get_index_labels
+from janitor.utils import deprecated_alias
 
 
 @pf.register_dataframe_method

diff --git a/janitor/functions/coalesce.py b/janitor/functions/coalesce.py
@@ -1,10 +1,11 @@
 """Function for performing coalesce."""
 from typing import Any, Optional, Union
+
 import pandas as pd
 import pandas_flavor as pf
 
-from janitor.utils import check, deprecated_alias
 from janitor.functions.utils import get_index_labels
+from janitor.utils import check, deprecated_alias
 
 
 @pf.register_dataframe_method