Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watching a multiline string causes problems #12732

Open
CoconutMacaroon opened this issue Aug 17, 2024 · 1 comment · May be fixed by #12736
Open

Watching a multiline string causes problems #12732

CoconutMacaroon opened this issue Aug 17, 2024 · 1 comment · May be fixed by #12736

Comments

@CoconutMacaroon
Copy link
Contributor

CoconutMacaroon commented Aug 17, 2024

Caution

Do not attempt to reproduce this on the production SmokeDetector instance. It will very likely break stuff and cause problems.

Summary

Attempting to use !!/watch-force to watch a multiline chat message breaks the watched_keywords.txt file by improperly inserting a newline.

Background information

There was a desire to modify a watch in CHQ, but the watch was longer than the limit for a 'normal' chat message. This isn't inherently an issue if it gets modified without the use of chat.

While multiline messages can get around that restriction, that also introduces a newline character (\n). From the regex perspective, that can be addressed (either by adding {0} directly after the newline or by starting a regex comment before the newline and ending it afterwards, as noted by @makyen).

However, it also breaks SmokeDetector as that inserts a newline directly into the watched_keywords.txt file, which breaks it. This was wisely foreseen by Makyen, hence not testing on production.

Details

  • For reference, here is the string I tested with. The newline (created with Shift + Enter) has been replaced with \n for this GitHub issue. 000000[trimmed for brevity]000000(?# foo\nbar)
  • SD can receive multiline chat messages, but at least in !!/bisect, it treats it as a \n. It just handles them problematically for watches (and presumably blacklists).
  • Attempting to watch a multiline string in chat, as I tried to here, breaks stuff. Specifically, it produced this as the end of watched_keywords.txt.
     65274  1723875289      cocomac 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000(?# foo
     65275  bar)
    
    And this console error:
    [06:19:34.736] watched_keywords.txt:65275:not enough values to unpack (expected 3, got 1)
      File "/home/ubuntu/SmokeDetector/blacklists.py", line 105, in parse
        when, by_whom, what = line.rstrip().split('\t')
        ^^^^^^^^^^^^^^^^^^^
    
    [06:19:40.636] Global blacklists loaded
    
  • Here is the relevant part of the console logs after attempting a bisect of some text that would match the regex.
    [06:21:42.072] Command received: !!/bisect 00000000000000000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    000000000000000000000000000000<span>&hellip;</span>
    [06:21:50.007] Attempted to send status ping but metasmoke_host is undefined. Not sent.
    [06:22:10.883] regex._regex_core.error: missing ) at position 975
    2024-08-17 06:22:10.878529 UTC
      File "/home/ubuntu/SmokeDetector/chatcommunicate.py", line 530, in __call__
        result = self.__func__(*processed_args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1691, in bisect
        results_bookended = bisect_regex_in_n_size_chunks(64, s, bookended_regexes, bookend=True, timeout=timeout)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1624, in bisect_regex_in_n_size_chunks
        results = bisect_regex(test_text, chunk, bookend=bookend, timeout=timeout)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1573, in bisect_regex
        compiled = regex_compile_no_cache(formatted_regex, city=findspam.city_list, ignore_unused=True)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/helpers.py", line 491, in regex_compile_no_cache
        return regex_raw_compile(regex_text, flags, ignore_unused, kwargs, False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/regex/regex.py", line 542, in _compile
        raise error(caught_exception.msg, caught_exception.pattern,
    
    
    
      File "/home/ubuntu/SmokeDetector/chatcommunicate.py", line 530, in __call__
        result = self.__func__(*processed_args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1691, in bisect
        results_bookended = bisect_regex_in_n_size_chunks(64, s, bookended_regexes, bookend=True, timeout=timeout)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1624, in bisect_regex_in_n_size_chunks
        results = bisect_regex(test_text, chunk, bookend=bookend, timeout=timeout)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1573, in bisect_regex
        compiled = regex_compile_no_cache(formatted_regex, city=findspam.city_list, ignore_unused=True)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/SmokeDetector/helpers.py", line 491, in regex_compile_no_cache
        return regex_raw_compile(regex_text, flags, ignore_unused, kwargs, False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/regex/regex.py", line 542, in _compile
        raise error(caught_exception.msg, caught_exception.pattern,
    

What problem has occurred? What issues has it caused?

It broke. Details are above. As this was on a testing instance, it didn't impact the production instance.

On my testing instance, it broke the watched_keywords.txt file, which would need manual intervention to fix. When I tested locally, it failed to push the broken watched_keywords.txt file to GitHub, but I don't know if that's due to my testing setup or this bug. Various Git-related SD commands might be able to restore it to a functional state, although I did not test that, so I cannot confirm if that would be a remedy if this happened to a production SD instance.

What would you like to happen/not happen?

"like to happen"

  • Reject a multiline chat watch with a helpful message. Something like Multiline chat watches are not supported. Consider using Git (or pinging someone that can) as an alternative. might work*
  • Accept the multiline watch and correctly input it into the relevant file. This would also require deciding how to treat the newline (leave it as-is, remove the newline, remove everything after & including the first newline, etc.)

*I don't know what the preference is on the best way to request a manual Git-based change, so if that path is chosen, someone should probably confirm that wording.

"like to not happen"

It should not corrupt files.

@MastSE
Copy link
Contributor

MastSE commented Aug 17, 2024

First of all, thanks for testing this and taking the effort not to do so in production. I'm in favour of rejecting the multiline chat watch with a helpful message. We already have a system in place to deal with 'wrong' watch suggestions, adding another detection with message seems like the least complicated fix for this bug.

Consider long watches are not necessarily something we want a lot of on the watchlist. If lists of exclusions need to increase in length (which triggered this test) regularly, it must be understood at some point it may be better to split the watch into multiple watches that are more specific and don't require as many exclusions. Watches with many exclusions are too dangerous to upgrade to the blacklist if you don't know when a new exception will pop up.

For the few cases where this isn't possible, making the change through Git (or requesting the change to be made by someone else) seems like the appropriate solution. This also makes it easier to see how a watch has mutated over time, in my opinion.

If this is considered a complicated process, we can consider amending the wiki with instructions. Working with longer regex are somewhat of an advanced part of Charcoal. Getting people who are interested in that more involved on Git isn't a bad thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants