Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] TESTING GITHUB ACTIONS PIPELINE FOR PYTHON UPDATE #4044

Open
wants to merge 67 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c27d123
Removed torchtext from NGramTokenizer
Oct 25, 2024
8e90e70
Refactored SentencePieceTokenizer
Oct 26, 2024
92a3ec0
removed torchtext
Oct 28, 2024
01f308e
Rewrote tests
Oct 29, 2024
6011fb1
createda about file for hatch versioning and updated pyproject
ethanreidel Oct 30, 2024
fd6d146
moved to hatch build system in .toml file
ethanreidel Nov 1, 2024
7e2574f
added dockerfile for hatch
ethanreidel Nov 8, 2024
8e9d32d
Altered dependencies so it works for hatch env create
Nov 17, 2024
9c79935
Updated pytest workflows python versions
Nov 17, 2024
0a06171
Fixed minimal test python version
Nov 17, 2024
4256075
Fixed small error with versioning naming
Nov 17, 2024
a09ef95
Adding ludwig script to pyproject.toml
Nov 17, 2024
687c8ea
added tifffile to dependencies
Nov 17, 2024
123b678
Bumped Python Version for Minimal Test
Nov 17, 2024
da47379
fixed tifffile dep. and removed tests from pytest.yml
Nov 17, 2024
68fb836
Add combinatorial tests to pytest.yaml
Nov 17, 2024
6a896f9
Refined pyproject toml
Nov 19, 2024
8d0f0c4
Removed importlib from dependencies
Nov 19, 2024
5e48bef
Merge remote-tracking branch 'origin/remove_torchtext' into hatch_dev_mh
Nov 20, 2024
a926588
bump torch version to 2.4.1
Nov 20, 2024
5b288f2
fallback to eager for torch dynamo to prefent error
Nov 20, 2024
26fe135
added flake8 ignore line length error
Nov 20, 2024
551eb40
added pytest suit to jobs
Nov 20, 2024
e4161c6
Refactored Matrix Tests
Nov 20, 2024
01533ef
Remove Neuropod from tests.
Nov 20, 2024
b2a1454
Further removed torchtext
Nov 25, 2024
80dc452
try to fix openblas issue
Nov 25, 2024
f45d046
test pythran 0.9
Nov 26, 2024
802748c
fix version code
Nov 26, 2024
68566b8
second fix version code
Nov 26, 2024
a108bb5
pyhtran via pip
Nov 26, 2024
3607bd2
added prefer binary
Nov 26, 2024
c1af754
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 1, 2024
5279f0c
updated wandb version
ethanreidel Dec 1, 2024
d33982a
Merge branch 'test_python_update' of github.com:ethanreidel/ludwig in…
ethanreidel Dec 2, 2024
500b5fc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2024
c1eb205
updated matplotlib
ethanreidel Dec 2, 2024
6f90a5f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2024
a3704a1
fixed invalid error in toml
ethanreidel Dec 2, 2024
294a760
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2024
0ab5858
testing matplotlib version
ethanreidel Dec 2, 2024
3dea80f
bumped scipy version and added matplotlib to default dependencies
ethanreidel Dec 8, 2024
cbc3d80
commented out combinatorial tests
ethanreidel Jan 14, 2025
56d5302
added GPy step to pytest jobs
ethanreidel Jan 16, 2025
5148870
fixed gpy typo
ethanreidel Jan 16, 2025
17f7846
added download longintrepr file
ethanreidel Jan 16, 2025
7f3975d
added sudo privs to longint file
ethanreidel Jan 16, 2025
1081593
added cython .29.35 trying to fix GPy error
ethanreidel Jan 16, 2025
a596ee6
testing install dependency line
ethanreidel Jan 21, 2025
f37c512
fix yml issue in actions
ethanreidel Jan 21, 2025
6d2195f
debugging GPy error
ethanreidel Jan 21, 2025
10b85c1
added logging for successful installs
ethanreidel Jan 22, 2025
6715601
added logging for more GPy debug tests
ethanreidel Jan 22, 2025
67c593c
added more tests
ethanreidel Jan 23, 2025
db2204f
testing sqlalchemy
ethanreidel Jan 23, 2025
b120232
zoopt typo
ethanreidel Jan 23, 2025
6a78456
added distributed, explain, extra tests
ethanreidel Jan 23, 2025
a86c1e7
added final dependencies
ethanreidel Jan 23, 2025
87e0a64
added test install all
ethanreidel Jan 23, 2025
fbcb7e1
testing install all
ethanreidel Jan 23, 2025
10c8537
fix yaml issue
ethanreidel Jan 23, 2025
db68442
testing each group of dependencies
ethanreidel Jan 23, 2025
6c45530
fixed yaml issue
ethanreidel Jan 24, 2025
7025eb9
fixed quote issue
ethanreidel Jan 24, 2025
1363dc2
testing base dependencies
ethanreidel Jan 24, 2025
c3f0823
testing GPy issue
ethanreidel Jan 24, 2025
d148e4a
script to check dependencies
ethanreidel Jan 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
617 changes: 400 additions & 217 deletions .github/workflows/pytest.yml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion .github/workflows/pytest_slow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
python --version
pip --version
python -m pip install -U pip
pip install torch==2.1.0 torchtext torchvision torchaudio
pip install torch==2.1.0 torchvision torchaudio
pip install ray==2.3.1
pip install '.[test]'

Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,5 @@ examples/*/visualizations/

# benchmarking configs
ludwig/benchmarking/configs/
pytest.xml
ludwig.code-workspace
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ repos:
- id: pyupgrade
args: [--py36-plus]
- repo: https://github.com/PyCQA/docformatter
rev: v1.5.1
rev: 06907d0
hooks:
- id: docformatter
args: [--in-place, --wrap-summaries=115, --wrap-descriptions=120]
Expand Down
2 changes: 1 addition & 1 deletion docker/ludwig-ray-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ RUN pip install -U pip

WORKDIR /ludwig

RUN pip install --no-cache-dir torch==2.1.0 torchtext torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
RUN pip install --no-cache-dir torch==2.1.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

COPY . .
RUN pip install --no-cache-dir '.[full]' --extra-index-url https://download.pytorch.org/whl/cu118
2 changes: 1 addition & 1 deletion docker/ludwig-ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ RUN pip install -U pip

WORKDIR /ludwig

RUN pip install --no-cache-dir torch==2.1.0 torchtext torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir torch==2.1.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

COPY . .
RUN pip install --no-cache-dir '.[full]' --extra-index-url https://download.pytorch.org/whl/cpu
2 changes: 1 addition & 1 deletion docker/ludwig/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ RUN pip install -U pip

WORKDIR /ludwig

RUN pip install --no-cache-dir torch==2.0.0 torchtext torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir torch==2.0.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

COPY . .
RUN pip install --no-cache-dir '.[full]'
Expand Down
17 changes: 17 additions & 0 deletions docker/ludwig_hatch/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM python:3.12

ENV PATH="/root/.local/bin:$PATH"
RUN apt-get -y update
RUN apt-get -y install pipx
RUN apt-get -y install git libsndfile1 build-essential g++ cmake ffmpeg sox libsox-dev
RUN pipx ensurepath --force
RUN pipx install hatch
RUN python3 -m pip install --upgrade pipx
WORKDIR /ludwig
#COPY /ludwig/ .
COPY . .

RUN hatch env create
RUN hatch build

ENTRYPOINT ["ludwig"]
1 change: 1 addition & 0 deletions ludwig/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "1.13.0"
15 changes: 6 additions & 9 deletions ludwig/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2015,9 +2015,9 @@ def to_torchscript(
# Inputs

:param model_only (bool, optional): If True, only the ECD model will be converted to Torchscript. Else,
preprocessing and postprocessing steps will also be converted to Torchscript.
:param device (TorchDevice, optional): If None, the model will be converted to Torchscript on the same device to
ensure maximum model parity.
preprocessing and postprocessing steps will also be converted to Torchscript. :param device (TorchDevice,
optional): If None, the model will be converted to Torchscript on the same device to ensure maximum model
parity.

# Returns

Expand Down Expand Up @@ -2086,11 +2086,8 @@ def create_model(config_obj: Union[ModelConfig, dict], random_seed: int = defaul

# Inputs
:param config_obj: (Union[Config, dict]) Ludwig config object
:param random_seed: (int, default: ludwig default random seed) Random
seed used for weights initialization,
splits and any other random function.

# Return
:param random_seed: (int, default: ludwig default random seed) Random seed used for weights initialization,
splits and any other random function. # Return
:return: (ludwig.models.BaseModel) Instance of the Ludwig model object.
"""
if isinstance(config_obj, dict):
Expand Down Expand Up @@ -2136,7 +2133,7 @@ def is_merge_and_unload_set(self) -> bool:

# Return

:return (bool): whether merge_and_unload should be done.
:return (bool): whether merge_and_unload should be done.
"""
# TODO: In the future, it may be possible to move up the model type check into the BaseModel class.
return self.config_obj.model_type == MODEL_LLM and self.model.is_merge_and_unload_set()
Expand Down
31 changes: 10 additions & 21 deletions ludwig/automl/base_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,8 @@ class DatasetInfo:
def allocate_experiment_resources(resources: Resources) -> dict:
"""Allocates ray trial resources based on available resources.

# Inputs
:param resources (dict) specifies all available GPUs, CPUs and associated
metadata of the machines (i.e. memory)
# Inputs :param resources (dict) specifies all available GPUs, CPUs and associated metadata of the machines
(i.e. memory)

# Return
:return: (dict) gpu and cpu resources per trial
Expand Down Expand Up @@ -260,9 +259,7 @@ def get_dataset_info(df: Union[pd.DataFrame, dd.core.DataFrame]) -> DatasetInfo:
inference.

# Inputs
:param df: (Union[pd.DataFrame, dd.core.DataFrame]) Pandas or Dask dataframe.

# Return
:param df: (Union[pd.DataFrame, dd.core.DataFrame]) Pandas or Dask dataframe. # Return
:return: (DatasetInfo) Structure containing list of FieldInfo objects.
"""
source = wrap_data_source(df)
Expand Down Expand Up @@ -297,9 +294,7 @@ def get_dataset_info_from_source(source: DataSource) -> DatasetInfo:
inference.

# Inputs
:param source: (DataSource) A wrapper around a data source, which may represent a pandas or Dask dataframe.

# Return
:param source: (DataSource) A wrapper around a data source, which may represent a pandas or Dask dataframe. # Return
:return: (DatasetInfo) Structure containing list of FieldInfo objects.
"""
row_count = len(source)
Expand Down Expand Up @@ -355,10 +350,8 @@ def get_features_config(

# Inputs
:param fields: (List[FieldInfo]) FieldInfo objects for all fields in dataset
:param row_count: (int) total number of entries in original dataset
:param target_name (str, List[str]) name of target feature

# Return
:param row_count: (int) total number of entries in original dataset :param target_name (str, List[str]) name of
target feature # Return
:return: (dict) section of auto_train config for input_features and output_features
"""
targets = convert_targets(target_name)
Expand All @@ -379,10 +372,8 @@ def get_config_from_metadata(metadata: List[FieldMetadata], targets: Set[str] =
"""Builds input/output feature sections of auto-train config using field metadata.

# Inputs
:param metadata: (List[FieldMetadata]) field descriptions
:param targets (Set[str]) names of target features

# Return
:param metadata: (List[FieldMetadata]) field descriptions :param targets (Set[str]) names of target features #
Return
:return: (dict) section of auto_train config for input_features and output_features
"""
config = {
Expand All @@ -405,10 +396,8 @@ def get_field_metadata(fields: List[FieldInfo], row_count: int, targets: Set[str

# Inputs
:param fields: (List[FieldInfo]) FieldInfo objects for all fields in dataset
:param row_count: (int) total number of entries in original dataset
:param targets (Set[str]) names of target features

# Return
:param row_count: (int) total number of entries in original dataset :param targets (Set[str]) names of target
features # Return
:return: (List[FieldMetadata]) list of objects containing metadata for each field
"""

Expand Down
9 changes: 5 additions & 4 deletions ludwig/backend/_ray210_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
class TunerRay210(Tuner):
"""HACK(geoffrey): This is a temporary fix to support Ray 2.1.0.

Specifically, this Tuner ensures that TunerInternalRay210 is called by the class.
For more details, see TunerInternalRay210.
Specifically, this Tuner ensures that TunerInternalRay210 is called by the class. For more details, see
TunerInternalRay210.
"""

def __init__(
Expand Down Expand Up @@ -120,8 +120,9 @@ def restore(
class TunerInternalRay210(TunerInternal):
"""HACK(geoffrey): This is a temporary fix to support Ray 2.1.0.

This TunerInternal ensures that a division by zero is avoided when running zero-CPU hyperopt trials.
This is fixed in ray>=2.2 (but not ray<=2.1) here: https://github.com/ray-project/ray/pull/30598
This TunerInternal ensures that a division by zero is avoided when running zero-CPU hyperopt trials. This is fixed
in ray>=2.2 (but not ray<=2.1) here:
https://github.com/ray-project/ray/pull/30598
"""

def _expected_utilization(self, cpus_per_trial, cpus_total):
Expand Down
4 changes: 2 additions & 2 deletions ludwig/backend/datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ def _open_input_source(

The default implementation opens the source path as a sequential input stream.

Implementations that do not support streaming reads (e.g. that require random
access) should override this method.
Implementations that do not support streaming reads (e.g. that require random access) should override this
method.
"""
if path is None or is_http(path):
return contextlib.nullcontext()
Expand Down
2 changes: 1 addition & 1 deletion ludwig/backend/deepspeed.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def __init__(
fp16: Optional[Dict[str, Any]] = None,
bf16: Optional[Dict[str, Any]] = None,
compression_training: Optional[Dict[str, Any]] = None,
**kwargs
**kwargs,
):
super().__init__(**kwargs)
self.zero_optimization = zero_optimization
Expand Down
3 changes: 2 additions & 1 deletion ludwig/benchmarking/summary_dataclasses.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
import ludwig.modules.metric_modules # noqa: F401
from ludwig.benchmarking.utils import format_memory, format_time
from ludwig.globals import MODEL_FILE_NAME, MODEL_HYPERPARAMETERS_FILE_NAME
from ludwig.modules.metric_registry import get_metric_classes, metric_feature_type_registry # noqa: F401
from ludwig.modules.metric_registry import get_metric_classes # noqa: F401
from ludwig.modules.metric_registry import metric_feature_type_registry
from ludwig.types import ModelConfigDict
from ludwig.utils.data_utils import load_json

Expand Down
3 changes: 1 addition & 2 deletions ludwig/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def on_preprocess_end(self, training_set, validation_set, test_set, training_set
:param test_set: The test set.
:type test_set: ludwig.dataset.base.Dataset
:param training_set_metadata: Values inferred from the training set, including preprocessing settings,
vocabularies, feature statistics, etc. Same as training_set_metadata.json.
vocabularies, feature statistics, etc. Same as training_set_metadata.json.
"""

pass
Expand Down Expand Up @@ -374,7 +374,6 @@ def prepare_ray_tune(self, train_fn: Callable, tune_config: Dict[str, Any], tune
:param train_fn: The function which runs the experiment trial.
:param tune_config: The ray tune configuration dictionary.
:param tune_callbacks: List of callbacks (not used yet).

:returns: Tuple[Callable, Dict] The train_fn and tune_config, which will be passed to ray tune.
"""
return train_fn, tune_config
3 changes: 2 additions & 1 deletion ludwig/config_validation/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,8 @@ def check_hyperopt_parameter_dicts(config: "ModelConfig") -> None: # noqa: F821
if config.hyperopt is None:
return

from ludwig.schema.hyperopt.utils import get_parameter_cls, parameter_config_registry # noqa: F401
from ludwig.schema.hyperopt.utils import get_parameter_cls # noqa: F401
from ludwig.schema.hyperopt.utils import parameter_config_registry

for parameter, space in config.hyperopt.parameters.items():
# skip nested hyperopt parameters
Expand Down
6 changes: 4 additions & 2 deletions ludwig/config_validation/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@

# TODO(travis): figure out why we need these imports to avoid circular import error
from ludwig.schema.combiners.utils import get_combiner_jsonschema # noqa
from ludwig.schema.features.utils import get_input_feature_jsonschema, get_output_feature_jsonschema # noqa
from ludwig.schema.features.utils import get_input_feature_jsonschema # noqa
from ludwig.schema.features.utils import get_output_feature_jsonschema
from ludwig.schema.hyperopt import get_hyperopt_jsonschema # noqa
from ludwig.schema.trainer import get_model_type_jsonschema, get_trainer_jsonschema # noqa
from ludwig.schema.trainer import get_model_type_jsonschema # noqa
from ludwig.schema.trainer import get_trainer_jsonschema
from ludwig.schema.utils import unload_jsonschema_from_marshmallow_class

VALIDATION_LOCK = Lock()
Expand Down
1 change: 0 additions & 1 deletion ludwig/contrib.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Module for handling contributed support."""

import argparse
Expand Down
1 change: 0 additions & 1 deletion ludwig/contribs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""All contrib classes must implement the `ludwig.callbacks.Callback` interface.

If you don't want to handle the call, either provide an empty method with `pass`, or just don't implement the method.
Expand Down
6 changes: 3 additions & 3 deletions ludwig/data/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2086,12 +2086,12 @@ def _preprocess_file_for_training(

:param features: list of all features (input + output)
:param dataset: path to the data
:param training_set: training data
:param training_set: training data
:param validation_set: validation data
:param test_set: test data
:param training_set_metadata: train set metadata
:param skip_save_processed_input: if False, the pre-processed data is saved
as .hdf5 files in the same location as the csv files with the same names.
:param skip_save_processed_input: if False, the pre-processed data is saved as .hdf5 files in the same location as
the csv files with the same names.
:param preprocessing_params: preprocessing parameters
:param random_seed: random seed
:return: training, test, validation datasets, training metadata
Expand Down
4 changes: 2 additions & 2 deletions ludwig/data/sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ def __len__(self):
def set_epoch(self, epoch):
"""Sets the epoch for this sampler.

When `shuffle=True`, this ensures all replicas use a different random ordering
for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.
When `shuffle=True`, this ensures all replicas use a different random ordering for each epoch. Otherwise, the
next iteration of this sampler will yield the same ordering.

:param epoch: (int) epoch number
"""
Expand Down
8 changes: 3 additions & 5 deletions ludwig/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ def get_datasets_output_features(
:param include_competitions: (bool) whether to include the output features from kaggle competition datasets
:param include_data_modalities: (bool) whether to include the data modalities associated with the prediction task
:return: (dict) dictionary with the output features for each dataset or a dictionary with the output features for
the specified dataset
the specified dataset
"""
ordered_configs = OrderedDict(sorted(_get_dataset_configs().items()))
competition_datasets = []
Expand Down Expand Up @@ -321,10 +321,8 @@ def _get_hf_dataset_and_subsample(dataset_name: str) -> Tuple[str, Optional[str]

The dataset name should follow the format "{HF_PREFIX}{hf_id}--{hf_subsample}"

Examples (Dataset Name --> HF ID; HF subsample):
"hf://wikisql" --> "wikisql"; None
"hf://ColumbiaNLP/FLUTE" --> "ColumbiaNLP/FLUTE"; None
"hf://mstz/adult--income" --> "mstz/adult"; "income"
Examples (Dataset Name --> HF ID; HF subsample): "hf://wikisql" --> "wikisql"; None "hf://ColumbiaNLP/FLUTE" -->
"ColumbiaNLP/FLUTE"; None "hf://mstz/adult--income" --> "mstz/adult"; "income"
"""
dataset_name = dataset_name[len(HF_PREFIX) :]
dataset_name = dataset_name.split("--")
Expand Down
17 changes: 5 additions & 12 deletions ludwig/datasets/loaders/mnist.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,8 @@ def load_unprocessed_dataframe(self, file_paths: List[str]) -> pd.DataFrame:
def read_source_dataset(self, dataset="training", path="."):
"""Create a directory for training and test and extract all the images and labels to this destination.

:args:
dataset (str) : the label for the dataset
path (str): the raw dataset path
:returns:
A tuple of the label for the image, the file array, the size and rows and columns for the image
:args: dataset (str) : the label for the dataset path (str): the raw dataset path
:returns: A tuple of the label for the image, the file array, the size and rows and columns for the image
"""
if dataset == "training":
fname_img = os.path.join(path, "train-images-idx3-ubyte")
Expand All @@ -87,13 +84,9 @@ def read_source_dataset(self, dataset="training", path="."):
def write_output_dataset(self, labels, images, output_dir):
"""Create output directories where we write out the images.

:args:
labels (str) : the labels for the image
data (np.array) : the binary array corresponding to the image
output_dir (str) : the output directory that we need to write to
path (str): the raw dataset path
:returns:
A tuple of the label for the image, the file array, the size and rows and columns for the image
:args: labels (str) : the labels for the image data (np.array) : the binary array corresponding to the
image output_dir (str) : the output directory that we need to write to path (str): the raw dataset path
:returns: A tuple of the label for the image, the file array, the size and rows and columns for the image
"""
# create child image output directories
output_dirs = [os.path.join(output_dir, str(i)) for i in range(NUM_LABELS)]
Expand Down
6 changes: 4 additions & 2 deletions ludwig/datasets/loaders/split_loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@

class RandomSplitLoader(DatasetLoader):
"""Adds a random split column to the dataset, with fixed proportions of:
train: 70%

train: 70%
validation: 10%
test: 20%
."""
.
"""

def transform_dataframe(self, dataframe: pd.DataFrame) -> pd.DataFrame:
df = super().transform_dataframe(dataframe)
Expand Down
2 changes: 1 addition & 1 deletion ludwig/decoders/llm_decoders.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# flake8: noqa: E501
import logging
import re
from typing import Any, Dict, List, Union
Expand Down Expand Up @@ -91,7 +92,6 @@ def __init__(
# Transformer Tokenizers
self.tokenizer_vocab_size = self.tokenizer.tokenizer.vocab_size
else:
# TorchText Tokenizers
self.tokenizer_vocab_size = len(self.tokenizer.vocab)

# Maximum number of new tokens that will be generated
Expand Down
Loading
Loading