Train the recognizer #243

clanfreak1988 · 2023-09-18T18:30:49Z

Good evening,
I try to execute following example to train the recognizer
https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html#train-the-recognizer
Sample Code:
` data_dir = '.'
alphabet = string.digits + string.ascii_letters + '!?. '
recognizer_alphabet = ''.join(sorted(set(alphabet.lower())))
fonts = keras_ocr.data_generation.get_fonts(
alphabet=alphabet,
cache_dir=data_dir
)
backgrounds = keras_ocr.data_generation.get_backgrounds(cache_dir=data_dir)
text_generator = keras_ocr.data_generation.get_text_generator(alphabet=alphabet)
print('The first generated text is:', next(text_generator))

def get_train_val_test_split(arr):
    train, valtest = sklearn.model_selection.train_test_split(arr, train_size=0.8, random_state=42)
    val, test = sklearn.model_selection.train_test_split(valtest, train_size=0.5, random_state=42)
    return train, val, test


background_splits = get_train_val_test_split(backgrounds)
font_splits = get_train_val_test_split(fonts)

image_generators = [
    keras_ocr.data_generation.get_image_generator(
        height=640,
        width=640,
        text_generator=text_generator,
        font_groups={
            alphabet: current_fonts
        },
        backgrounds=current_backgrounds,
        font_size=(60, 120),
        margin=50,
        rotationX=(-0.05, 0.05),
        rotationY=(-0.05, 0.05),
        rotationZ=(-15, 15)
    ) for current_fonts, current_backgrounds in zip(
        font_splits,
        background_splits
    )
]

# See what the first validation image looks like.
image, lines = next(image_generators[1])
text = keras_ocr.data_generation.convert_lines_to_paragraph(lines)
print('The first generated validation image (below) contains:', text)
plt.imshow(image)`

When I execute this example it download everything and filtering the fonts, but the returned list is empty.
I get following Error:
File "C:\Users\User\PycharmProjects\dev\main.py", line 54, in <module> font_splits = get_train_val_test_split(fonts) File "C:\Users\User\PycharmProjects\dev\main.py", line 48, in get_train_val_test_split train, valtest = sklearn.model_selection.train_test_split(arr, train_size=0.8, random_state=42) File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\utils\_param_validation.py", line 211, in wrapper return func(*args, **kwargs) File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2617, in train_test_split n_train, n_test = _validate_shuffle_split( File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2273, in _validate_shuffle_split raise ValueError( ValueError: With n_samples=0, test_size=None and train_size=0.8, the resulting train set will be empty. Adjust any of the aforementioned parameters.

The mentioned file return everytime a false bug

keras-ocr/keras_ocr/data_generation.py

Line 115 in b9c5a58

def font_supports_alphabet(filepath, alphabet):

OS: Windows 10
Python: Python Python 3.9.0
keras-ocr: 0.9.2
When you need more Information about this issue, please request it

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train the recognizer #243

Train the recognizer #243

clanfreak1988 commented Sep 18, 2023

Train the recognizer #243

Train the recognizer #243

Comments

clanfreak1988 commented Sep 18, 2023