Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quieter logs in LanKit test suite #135

Merged
merged 3 commits into from
Aug 24, 2023
Merged

Quieter logs in LanKit test suite #135

merged 3 commits into from
Aug 24, 2023

Conversation

richard-rogers
Copy link
Contributor

Increase log level around tests with intentional bad inputs. Change logging level to info instead of warn for normal input/output module init.


load-test:
poetry run pytest langkit/tests --load
poetry run pytest langkit/tests -o log_level=WARN -o log_cli=true --load
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ERROR level that is the most common problem I see when running the load tests, the standard unit tests I think are ok at INFO?

e.g.

ERROR    whylogs.experimental.core.udf_schema:udf_schema.py:77 Evaluating UDF response.monosyllable_count failed
Traceback (most recent call last):
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/whylogs/experimental/core/udf_schema.py", line 74, in _apply_udfs_on_row
    new_columns[new_col] = udf(values)[0]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 49, in wrappee
    return [stat(input) for input in text[column]]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 49, in <listcomp>
    return [stat(input) for input in text[column]]
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 1407, in monosyllabcount
    word_list = self.remove_punctuation(text).split()
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 268, in remove_punctuation
    text = re.sub(punctuation_regex, '', text)
  File "/usr/lib/python3.8/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
ERROR    whylogs.experimental.core.udf_schema:udf_schema.py:77 Evaluating UDF response.difficult_words failed
Traceback (most recent call last):
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/whylogs/experimental/core/udf_schema.py", line 74, in _apply_udfs_on_row
    new_columns[new_col] = udf(values)[0]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 49, in wrappee
    return [stat(input) for input in text[column]]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 49, in <listcomp>
    return [stat(input) for input in text[column]]
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 920, in difficult_words
    return len(self.difficult_words_list(text, syllable_threshold))
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 942, in difficult_words_list
    words = set(re.findall(r"[\w\='‘’]+", text.lower()))
AttributeError: 'int' object has no attribute 'lower'
ERROR    whylogs.experimental.core.udf_schema:udf_schema.py:77 Evaluating UDF response.aggregate_reading_level failed
Traceback (most recent call last):
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/whylogs/experimental/core/udf_schema.py", line 74, in _apply_udfs_on_row
    new_columns[new_col] = udf(values)[0]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 60, in wrappee
    return [stat(input, float_output=True) for input in text[column]]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/textstat.py", line 60, in <listcomp>
    return [stat(input, float_output=True) for input in text[column]]
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 1191, in text_standard
    lower = self._legacy_round(self.flesch_kincaid_grade(text))
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 711, in flesch_kincaid_grade
    sentence_length = self.avg_sentence_length(text)
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 400, in avg_sentence_length
    asl = float(self.lexicon_count(text) / self.sentence_count(text))
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 295, in lexicon_count
    text = self.remove_punctuation(text)
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/textstat/textstat.py", line 268, in remove_punctuation
    text = re.sub(punctuation_regex, '', text)
  File "/usr/lib/python3.8/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
ERROR    whylogs.experimental.core.udf_schema:udf_schema.py:77 Evaluating UDF response.custom_group_count failed
Traceback (most recent call last):
  File "/home/jamie/.cache/pypoetry/virtualenvs/langkit-EeFODeF5-py3.8/lib/python3.8/site-packages/whylogs/experimental/core/udf_schema.py", line 74, in _apply_udfs_on_row
    new_columns[new_col] = udf(values)[0]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/count_regexes.py", line 26, in wrappee
    return [count_patterns(pattern_group, input) for input in text[column]]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/count_regexes.py", line 26, in <listcomp>
    return [count_patterns(pattern_group, input) for input in text[column]]
  File "/home/jamie/projects/v1/TextMetricsToolkit/langkit/count_regexes.py", line 18, in count_patterns
    if expression.search(text):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, any ERRORs logged should reflect real unexpected errors that we should investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whyuser@222bf565ee10:/workspace/langkit-config1$ make load-test
poetry run pytest langkit/tests -o log_level=WARN -o log_cli=true --load
========================================================================= test session starts ==========================================================================
platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0
rootdir: /workspace/langkit-config1
collected 28 items                                                                                                                                                     

langkit/tests/test_callback_handler.py::test_callback_passthroughs_undefined_ok PASSED                                                                           [  3%]
langkit/tests/test_callback_handler.py::test_callback_passthroughs_undefined_no_args PASSED                                                                      [  7%]
langkit/tests/test_callback_handler.py::test_callback_passthroughs_defined_functions PASSED                                                                      [ 10%]
langkit/tests/test_callback_handler.py::test_callback_passthroughs_defined_logging_functions PASSED                                                              [ 14%]
langkit/tests/test_callback_handler.py::test_callback_instance_handler_defined PASSED                                                                            [ 17%]
langkit/tests/test_callback_handler.py::test_callback_instance_handler_with_metadata PASSED                                                                      [ 21%]
langkit/tests/test_callback_handler.py::test_callback_instance_handler_defined_getattr PASSED                                                                    [ 25%]
langkit/tests/test_callback_handler.py::test_callback_instance_three_ply_class_hierarchy PASSED                                                                  [ 28%]
langkit/tests/test_count_patterns.py::test_count_patterns[False] PASSED                                                                                          [ 32%]
langkit/tests/test_count_patterns.py::test_count_patterns[True] PASSED                                                                                           [ 35%]
langkit/tests/test_injections.py::test_injections PASSED                                                                                                         [ 39%]
langkit/tests/test_injections.py::test_injections_long_prompt PASSED                                                                                             [ 42%]
langkit/tests/test_input_output.py::test_init_call PASSED                                                                                                        [ 46%]
langkit/tests/test_input_output.py::test_custom_encoder PASSED                                                                                                   [ 50%]
langkit/tests/test_input_output.py::test_similarity PASSED                                                                                                       [ 53%]
langkit/tests/test_nlp_scores.py::test_bleu_score PASSED                                                                                                         [ 57%]
langkit/tests/test_patterns.py::test_ptt[False] PASSED                                                                                                           [ 60%]
langkit/tests/test_patterns.py::test_ptt[True] PASSED                                                                                                            [ 64%]
langkit/tests/test_patterns.py::test_individual_patterns_isolated PASSED                                                                                         [ 67%]
langkit/tests/test_sentiment.py::test_sentiment PASSED                                                                                                           [ 71%]
langkit/tests/test_textstat.py::test_textstat PASSED                                                                                                             [ 75%]
langkit/tests/test_themes.py::test_init_call PASSED                                                                                                              [ 78%]
langkit/tests/test_themes.py::test_theme_custom PASSED                                                                                                           [ 82%]
langkit/tests/test_themes.py::test_theme PASSED                                                                                                                  [ 85%]
langkit/tests/test_themes.py::test_themes_with_json_string PASSED                                                                                                [ 89%]
langkit/tests/test_themes.py::test_themes_standalone PASSED                                                                                                      [ 92%]
langkit/tests/test_toxicity.py::test_toxicity PASSED                                                                                                             [ 96%]
langkit/tests/test_toxicity.py::test_toxicity_long_response PASSED                                                                                               [100%]

=========================================================================== warnings summary ===========================================================================
.venv/lib/python3.8/site-packages/textstat/textstat.py:7
  /workspace/langkit-config1/.venv/lib/python3.8/site-packages/textstat/textstat.py:7: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

langkit/tests/test_injections.py::test_injections
  /workspace/langkit-config1/.venv/lib/python3.8/site-packages/transformers/models/open_llama/modeling_open_llama.py:42: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
    logger.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================== 28 passed, 2 warnings in 30.36s ====================================================================

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that looks good. We don't need to explicitly set the level to warning since that is default, better to drop the:
-o log_level=WARN from the commands.

Also I don't see anything that looks broken in the unit tests with INFO logging, why switch that off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for quietness... We can set it to whatever the consensus level is

@@ -78,7 +78,7 @@ def meteor_score(text):
return result

else:
diagnostic_logger.warning(
diagnostic_logger.info(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the reference corpus to LangKitConfig so that we don't have to explicitly reinitialize this module?

Makefile Outdated Show resolved Hide resolved
Copy link
Collaborator

@jamie256 jamie256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@richard-rogers richard-rogers merged commit 344d346 into main Aug 24, 2023
@richard-rogers richard-rogers deleted the dev/richard/quiet branch August 24, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants