Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the recognizer #243

Open
clanfreak1988 opened this issue Sep 18, 2023 · 0 comments
Open

Train the recognizer #243

clanfreak1988 opened this issue Sep 18, 2023 · 0 comments

Comments

@clanfreak1988
Copy link

Good evening,
I try to execute following example to train the recognizer
https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html#train-the-recognizer
Sample Code:
` data_dir = '.'
alphabet = string.digits + string.ascii_letters + '!?. '
recognizer_alphabet = ''.join(sorted(set(alphabet.lower())))
fonts = keras_ocr.data_generation.get_fonts(
alphabet=alphabet,
cache_dir=data_dir
)
backgrounds = keras_ocr.data_generation.get_backgrounds(cache_dir=data_dir)
text_generator = keras_ocr.data_generation.get_text_generator(alphabet=alphabet)
print('The first generated text is:', next(text_generator))

def get_train_val_test_split(arr):
    train, valtest = sklearn.model_selection.train_test_split(arr, train_size=0.8, random_state=42)
    val, test = sklearn.model_selection.train_test_split(valtest, train_size=0.5, random_state=42)
    return train, val, test


background_splits = get_train_val_test_split(backgrounds)
font_splits = get_train_val_test_split(fonts)

image_generators = [
    keras_ocr.data_generation.get_image_generator(
        height=640,
        width=640,
        text_generator=text_generator,
        font_groups={
            alphabet: current_fonts
        },
        backgrounds=current_backgrounds,
        font_size=(60, 120),
        margin=50,
        rotationX=(-0.05, 0.05),
        rotationY=(-0.05, 0.05),
        rotationZ=(-15, 15)
    ) for current_fonts, current_backgrounds in zip(
        font_splits,
        background_splits
    )
]

# See what the first validation image looks like.
image, lines = next(image_generators[1])
text = keras_ocr.data_generation.convert_lines_to_paragraph(lines)
print('The first generated validation image (below) contains:', text)
plt.imshow(image)`

When I execute this example it download everything and filtering the fonts, but the returned list is empty.
I get following Error:
File "C:\Users\User\PycharmProjects\dev\main.py", line 54, in <module> font_splits = get_train_val_test_split(fonts) File "C:\Users\User\PycharmProjects\dev\main.py", line 48, in get_train_val_test_split train, valtest = sklearn.model_selection.train_test_split(arr, train_size=0.8, random_state=42) File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\utils\_param_validation.py", line 211, in wrapper return func(*args, **kwargs) File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2617, in train_test_split n_train, n_test = _validate_shuffle_split( File "C:\Users\User\PycharmProjects\dev\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2273, in _validate_shuffle_split raise ValueError( ValueError: With n_samples=0, test_size=None and train_size=0.8, the resulting train set will be empty. Adjust any of the aforementioned parameters.

The mentioned file return everytime a false bug

def font_supports_alphabet(filepath, alphabet):

OS: Windows 10
Python: Python Python 3.9.0
keras-ocr: 0.9.2
When you need more Information about this issue, please request it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant