From d8c1b4f939e2f507eeec7e925570b710ab7fb040 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 16:47:44 -0400 Subject: [PATCH 01/30] chore: clean up readme --- README.md | 298 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 181 insertions(+), 117 deletions(-) diff --git a/README.md b/README.md index 856e6ba..019de4c 100644 --- a/README.md +++ b/README.md @@ -1,50 +1,60 @@ # TwitchMarkovChain -Twitch Bot for generating messages based on what it learned from chat + +Twitch Bot for generating messages based on what it learned from chat --- -# Explanation -When the bot has started, it will start listening to chat messages in the channel listed in the `settings.txt` file. Any chat message not sent by a denied user will be learned from. Whenever someone then requests a message to be generated, a [Markov Chain](https://en.wikipedia.org/wiki/Markov_chain) will be used with the learned data to generate a sentence. Note that the bot is unaware of the meaning of any of its inputs and outputs. This means it can use bad language if it was taught to use bad language by people in chat. You can add a list of banned words it should never learn or say. Use at your own risk. +## Explanation + +When the bot has started, it will start listening to chat messages in the channel listed in the `settings.txt` file. Any chat message not sent by a denied user will be learned from. Whenever someone then requests a message to be generated, a [Markov Chain](https://en.wikipedia.org/wiki/Markov_chain) will be used with the learned data to generate a sentence. **Note that the bot is unaware of the meaning of any of its inputs and outputs. This means it can use bad language if it was taught to use bad language by people in chat. You can add a list of banned words it should never learn or say. Use at your own risk.** Whenever a message is deleted from chat, it's contents will be unlearned at 5 times the rate a normal message is learned from. The bot will avoid learning from commands, or from messages containing links. --- -# How it works -## Sentence Parsing +## How it works + +### Sentence Parsing + To explain how the bot works, I will provide an example situation with two messages that are posted in Twitch chat. The messages are: -
Curly fries are the worst kind of fries
-Loud people are the reason I don't go to the movies anymore
-
+ +> Curly fries are the worst kind of fries +> Loud people are the reason I don't go to the movies anymore + Let's start with the first sentence and parse it like the bot will. To do so, we will split up the sentence in sections of `keyLength + 1` words. As `keyLength` has been set to `2` in the [Settings](#settings) section, each section has `3` words. -

- Curly fries are the worst kind of fries
+
+```txt
+Curly fries are the worst kind of fries
 [Curly fries:are]
       [fries are:the]
             [are the:worst]
                 [the worst:kind]
                     [worst kind:of]
                           [kind of:fries]
-
+``` + For each of these sections of three words, the last word is considered the output, while all other words it are considered inputs. These words are then turned into a variation of a [Grammar](https://en.wikipedia.org/wiki/Formal_grammar): -
+
+```txt
 "Curly fries" -> "are"
 "fries are"   -> "the"
 "are the"     -> "worst"
 "the worst"   -> "kind"
 "worst kind"  -> "of"
 "kind of"     -> "fries"
-
-This can be considered a mathematical function that, when given input "the worst", will output "kind".
-In order for the program to know where sentences begin, we also add the first `keyLength` words to a seperate Database table, where a list of possible starts of sentences reside.
+``` + +This can be considered a mathematical function that, when given input "the worst", will output "kind". +In order for the program to know where sentences begin, we also add the first `keyLength` words to a seperate Database table, where a list of possible starts of sentences reside. This exact same process is applied to the second sentence as well. After doing so, the resulting grammar (and our corresponding database table) looks like: -
+
+```txt
 "Curly fries" -> "are"
 "fries are"   -> "the"
-"are the"     -> "worst" | "reason"
+"are the"     -> "worst" | "reason"
 "the worst"   -> "kind"
 "worst kind"  -> "of"
 "kind of"     -> "fries"
@@ -57,139 +67,193 @@ This exact same process is applied to the second sentence as well. After doing s
 "go to"       -> "the"
 "to the"      -> "movies"
 "the movies"  -> "anymore"
-
+``` + and in the database table for starts of sentences: -
+
+```txt
 "Curly fries"
 "Loud people"
-
-Note that the | is considered to be *"or"*. In the case of the bold text above, it could be read as: if the given input is "are the", then the output is either *"worst"* **or** *"reason"*. +``` -In practice, more frequent phrases will have higher precedence. The more often a phrase is said, the more likely it is to be generated. +Note that the | is considered to be _"or"_. In the case of the bold text above, it could be read as: if the given input is "are the", then the output is either _"worst"_ **or** _"reason"_. + +In practice, more frequent phrases will have higher precedence. The more often a phrase is said, the more likely it is to be generated. --- -## Generation +### Generation + +When a message is generated with `!generate`, a random start of a sentence is picked from the database table of starts of sentences. In our example the randomly picked start is _"Curly fries"_. -When a message is generated with `!generate`, a random start of a sentence is picked from the database table of starts of sentences. In our example the randomly picked start is *"Curly fries"*. +Now, in a loop: -Now, in a loop:
-- The output for the input is generated via the grammar.
-- And the input for the next iteration in the loop is shifted:
- - Remove the first word from the input.
- - Add the new output word to the end of the input.
+- The output for the input is generated via the grammar. +- And the input for the next iteration in the loop is shifted: + - Remove the first word from the input. + - Add the new output word to the end of the input. -So, the input starts as *"Curly Fries"*. The output for this input is generated via the grammar, which gives us *"are"*. Then, the input is updated. *"Curly"* is removed, and *"are"* is added to the input. The new input for the next iteration will be *"Fries are"* as a result. This process repeats until no more words can be generated, or if a word limit is reached. +So, the input starts as _"Curly Fries"_. The output for this input is generated via the grammar, which gives us _"are"_. Then, the input is updated. _"Curly"_ is removed, and _"are"_ is added to the input. The new input for the next iteration will be _"Fries are"_ as a result. This process repeats until no more words can be generated, or if a word limit is reached. A more programmatic example of this would be this: + ```python -# This initial sentence is either from the database for starts of sentences, +# This initial sentence is either from the database for starts of sentences, # or from words passed in Twitch chat sentence = ["Curly", "fries"] for i in range(sentence_length): - # Generate a word using last 2 words in the partial sentence, + # Generate a word using last 2 words in the partial sentence, # and append it to the partial sentence sentence.append(generate(sentence[-2:])) ``` It's common for an input sequence to have multiple possible outputs, as we can see in the bold part of the previous grammar. This allows learned information from multiple messages to be merged into one message. For instance, some potential outputs from the given example are -
Curly fries are the reason I don't go to the movies anymore
+ +> Curly fries are the reason I don't go to the movies anymore + or -
Loud people are the worst kind of fries
+ +> Loud people are the worst kind of fries --- -# Commands +## Commands + Chat members can generate chat-like messages using the following commands (Note that they are aliases): -
!generate [words]
-!g [words]
+ +```txt +!generate [words] +!g [words] +``` + Example: -
!g Curly
+ +```txt +!g Curly +``` + Result (for example): -
Curly fries are the reason I don't go to the movies anymore
-- The bot will, when given this command, try to complete the start of the sentence which was given.
- - If it cannot, an appropriate error message will be sent to chat.
-- Any number of words may be given, including none at all.
-- Everyone can use it.
+ +```txt +Curly fries are the reason I don't go to the movies anymore +``` + +- The bot will, when given this command, try to complete the start of the sentence which was given. + - If it cannot, an appropriate error message will be sent to chat. +- Any number of words may be given, including none at all. +- Everyone can use it. Furthermore, chat members can find a link to [How it works](#how-it-works) by using one of the following commands: -
!ghelp
-!genhelp
-!generatehelp
+ +```txt +!ghelp +!genhelp +!generatehelp +``` + The use of this command makes the bot post this message in chat: -
Learn how this bot generates sentences here: https://github.com/CubieDev/TwitchMarkovChain#how-it-works
+ +> Learn how this bot generates sentences here: --- -## Streamer commands -All of these commands can be whispered to the bot account, or typed in chat.
+ +### Streamer commands + +All of these commands can be whispered to the bot account, or typed in chat. To disable the bot from generating messages, while still learning from regular chat messages: -
!disable
+ +```txt +!disable +``` + After disabling the bot, it can be re-enabled using: -
!enable
+ +```txt +!enable +``` + Changing the cooldown between generations is possible with one of the following two commands: -
!setcooldown <seconds>
-!setcd <seconds>
+ +```txt +!setcooldown +!setcd +``` + Example: -
!setcd 30
+ +```txt +!setcd 30 +``` + Which sets the cooldown between generations to 30 seconds. --- -## Moderator commands -All of these commands must be whispered to the bot account.
-Moderators (and the broadcaster) can modify the blacklist to prevent the bot learning words it shouldn't.
+ +### Moderator commands + +All of these commands must be whispered to the bot account. +Moderators (and the broadcaster) can modify the blacklist to prevent the bot learning words it shouldn't. To add `word` to the blacklist, a moderator can whisper the bot: -
!blacklist word
+ +```txt +!blacklist +``` + Similarly, to remove `word` from the blacklist, a moderator can whisper the bot: -
!whitelist word
+ +```txt +!whitelist +``` + And to check whether `word` is already on the blacklist or not, a moderator can whisper the bot: -
!check word
+ +```txt +!check +``` --- -# Settings +## Settings + This bot is controlled by a `settings.txt` file, which has the following structure: + ```json { - "Host": "irc.chat.twitch.tv", - "Port": 6667, - "Channel": "#", - "Nickname": "", - "Authentication": "oauth:", - "DeniedUsers": [ - "StreamElements", - "Nightbot", - "Moobot", - "Marbiebot" - ], - "Cooldown": 20, - "KeyLength": 2, - "MaxSentenceWordAmount": 25, - "HelpMessageTimer": 7200, - "AutomaticGenerationTimer": -1 + "Host": "irc.chat.twitch.tv", + "Port": 6667, + "Channel": "#", + "Nickname": "", + "Authentication": "oauth:", + "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], + "Cooldown": 20, + "KeyLength": 2, + "MaxSentenceWordAmount": 25, + "HelpMessageTimer": 7200, + "AutomaticGenerationTimer": -1 } ``` -| **Parameter** | **Meaning** | **Example** | -| -------------------- | ----------- | ----------- | -| Host | The URL that will be used. Do not change. | "irc.chat.twitch.tv" | -| Port | The Port that will be used. Do not change. | 6667 | -| Channel | The Channel that will be connected to. | "#CubieDev" | -| Nickname | The Username of the bot account. | "CubieB0T" | -| Authentication | The OAuth token for the bot account. | "oauth:pivogip8ybletucqdz4pkhag6itbax" | -| DeniedUsers | The list of bot account who's messages should not be learned from. The bot itself it automatically added to this. | ["StreamElements", "Nightbot", "Moobot", "Marbiebot"] | -| Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | -| KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | -| MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | -| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | -| AutomaticGenerationTimer| The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | +| **Parameter** | **Meaning** | **Example** | +| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | +| Host | The URL that will be used. Do not change. | "irc.chat.twitch.tv" | +| Port | The Port that will be used. Do not change. | 6667 | +| Channel | The Channel that will be connected to. | "#CubieDev" | +| Nickname | The Username of the bot account. | "CubieB0T" | +| Authentication | The OAuth token for the bot account. | "oauth:pivogip8ybletucqdz4pkhag6itbax" | +| DeniedUsers | The list of bot account who's messages should not be learned from. The bot itself it automatically added to this. | ["StreamElements", "Nightbot", "Moobot", "Marbiebot"] | +| Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | +| KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | +| MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | +| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | +| AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | -*Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like.* +_Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ -I got my real OAuth token from https://twitchapps.com/tmi/. +I got my real OAuth token from . --- -## Blacklist +### Blacklist You may add words to a blacklist by adding them on a separate line in `blacklist.txt`. Each word is case insensitive. By default, this file only contains `` and ``, which are required for the current implementation. @@ -197,33 +261,33 @@ Words can also be added or removed from the blacklist via whispers, as is descri --- -# Requirements -* [Python 3.6+](https://www.python.org/downloads/) -* [Module requirements](requirements.txt)
-Install these modules using `pip install -r requirements.txt` in the commandline. +## Requirements + +- [Python 3.6+](https://www.python.org/downloads/) +- [Module requirements](requirements.txt) + - Install these modules using `pip install -r requirements.txt` in the commandline. Among these modules is my own [TwitchWebsocket](https://github.com/tomaarsen/TwitchWebsocket) wrapper, which makes making a Twitch chat bot a lot easier. This repository can be seen as an implementation using this wrapper. --- -# Other Twitch Bots - -* [TwitchAIDungeon](https://github.com/CubieDev/TwitchAIDungeon) -* [TwitchGoogleTranslate](https://github.com/CubieDev/TwitchGoogleTranslate) -* [TwitchCubieBotGUI](https://github.com/CubieDev/TwitchCubieBotGUI) -* [TwitchCubieBot](https://github.com/CubieDev/TwitchCubieBot) -* [TwitchRandomRecipe](https://github.com/CubieDev/TwitchRandomRecipe) -* [TwitchUrbanDictionary](https://github.com/CubieDev/TwitchUrbanDictionary) -* [TwitchRhymeBot](https://github.com/CubieDev/TwitchRhymeBot) -* [TwitchWeather](https://github.com/CubieDev/TwitchWeather) -* [TwitchDeathCounter](https://github.com/CubieDev/TwitchDeathCounter) -* [TwitchSuggestDinner](https://github.com/CubieDev/TwitchSuggestDinner) -* [TwitchPickUser](https://github.com/CubieDev/TwitchPickUser) -* [TwitchSaveMessages](https://github.com/CubieDev/TwitchSaveMessages) -* [TwitchMMLevelPickerGUI](https://github.com/CubieDev/TwitchMMLevelPickerGUI) (Mario Maker 2 specific bot) -* [TwitchMMLevelQueueGUI](https://github.com/CubieDev/TwitchMMLevelQueueGUI) (Mario Maker 2 specific bot) -* [TwitchPackCounter](https://github.com/CubieDev/TwitchPackCounter) (Streamer specific bot) -* [TwitchDialCheck](https://github.com/CubieDev/TwitchDialCheck) (Streamer specific bot) -* [TwitchSendMessage](https://github.com/CubieDev/TwitchSendMessage) (Meant for debugging purposes) - +## Other Twitch Bots + +- [TwitchAIDungeon](https://github.com/CubieDev/TwitchAIDungeon) +- [TwitchGoogleTranslate](https://github.com/CubieDev/TwitchGoogleTranslate) +- [TwitchCubieBotGUI](https://github.com/CubieDev/TwitchCubieBotGUI) +- [TwitchCubieBot](https://github.com/CubieDev/TwitchCubieBot) +- [TwitchRandomRecipe](https://github.com/CubieDev/TwitchRandomRecipe) +- [TwitchUrbanDictionary](https://github.com/CubieDev/TwitchUrbanDictionary) +- [TwitchRhymeBot](https://github.com/CubieDev/TwitchRhymeBot) +- [TwitchWeather](https://github.com/CubieDev/TwitchWeather) +- [TwitchDeathCounter](https://github.com/CubieDev/TwitchDeathCounter) +- [TwitchSuggestDinner](https://github.com/CubieDev/TwitchSuggestDinner) +- [TwitchPickUser](https://github.com/CubieDev/TwitchPickUser) +- [TwitchSaveMessages](https://github.com/CubieDev/TwitchSaveMessages) +- [TwitchMMLevelPickerGUI](https://github.com/CubieDev/TwitchMMLevelPickerGUI) (Mario Maker 2 specific bot) +- [TwitchMMLevelQueueGUI](https://github.com/CubieDev/TwitchMMLevelQueueGUI) (Mario Maker 2 specific bot) +- [TwitchPackCounter](https://github.com/CubieDev/TwitchPackCounter) (Streamer specific bot) +- [TwitchDialCheck](https://github.com/CubieDev/TwitchDialCheck) (Streamer specific bot) +- [TwitchSendMessage](https://github.com/CubieDev/TwitchSendMessage) (Meant for debugging purposes) From 084a60d511c7e1c8c6361675224cd8919c4b2feb Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:01:11 -0400 Subject: [PATCH 02/30] feat: implement bot owner to have same power as broadcaster --- MarkovChainBot.py | 8 ++++++-- README.md | 1 + Settings.py | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 6465682..2c2d2d9 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -63,13 +63,14 @@ def __init__(self): live=True) self.ws.start_bot() - def set_settings(self, host, port, chan, nick, auth, denied_users, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer): + def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer): self.host = host self.port = port self.chan = chan self.nick = nick self.auth = auth self.denied_users = [user.lower() for user in denied_users] + [self.nick.lower()] + self.bot_owner = bot_owner.lower() self.cooldown = cooldown self.key_length = key_length self.max_sentence_length = max_sentence_length @@ -436,7 +437,10 @@ def check_if_other_command(self, message) -> bool: def check_if_streamer(self, m) -> bool: # True if the user is the streamer - return m.user == m.channel + return m.user == m.channel or self.check_if_owner(m) + + def check_if_owner(self, m) -> bool: + return m.user == self.bot_owner; def check_link(self, message) -> bool: # True if message contains a link diff --git a/README.md b/README.md index 019de4c..275df8e 100644 --- a/README.md +++ b/README.md @@ -246,6 +246,7 @@ This bot is controlled by a `settings.txt` file, which has the following structu | MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | | HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | | AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | +| BotOwner | The owner of the bot's twitch username. Gives the owner the same power as the channel owner | "TestUser" | _Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ diff --git a/Settings.py b/Settings.py index aaa90e1..24867a8 100644 --- a/Settings.py +++ b/Settings.py @@ -61,6 +61,7 @@ def __init__(self, bot): data["Nickname"], data["Authentication"], data["DeniedUsers"], + data["BotOwner"], data["Cooldown"], data["KeyLength"], data["MaxSentenceWordAmount"], @@ -87,6 +88,7 @@ def write_default_settings_file(): "Nickname": "", "Authentication": "oauth:", "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], + "BotOwner": "", "Cooldown": 20, "KeyLength": 2, "MaxSentenceWordAmount": 25, From 3f34867c99f52cb106e2f97df23dad7ab5d0ad0e Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:04:43 -0400 Subject: [PATCH 03/30] feat: allow disabling of whispering globally --- MarkovChainBot.py | 46 ++++++++++++++++++++++++++-------------------- Settings.py | 6 ++++-- 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 2c2d2d9..4eab83c 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -63,7 +63,7 @@ def __init__(self): live=True) self.ws.start_bot() - def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer): + def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer, should_whisper): self.host = host self.port = port self.chan = chan @@ -76,6 +76,7 @@ def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, co self.max_sentence_length = max_sentence_length self.help_message_timer = help_message_timer self.automatic_generation_timer = automatic_generation_timer + self.should_whisper = should_whisper def message_handler(self, m): try: @@ -101,17 +102,17 @@ def message_handler(self, m): elif m.type in ("PRIVMSG", "WHISPER"): if m.message.startswith("!enable") and self.check_if_streamer(m): if self._enabled: - self.ws.send_whisper(m.user, "The !generate is already enabled.") + self.send_whisper(m.user, "The !generate is already enabled.") else: - self.ws.send_whisper(m.user, "Users can now !generate message again.") + self.send_whisper(m.user, "Users can now !generate message again.") self._enabled = True elif m.message.startswith("!disable") and self.check_if_streamer(m): if self._enabled: - self.ws.send_whisper(m.user, "Users can now no longer use !generate.") + self.send_whisper(m.user, "Users can now no longer use !generate.") self._enabled = False else: - self.ws.send_whisper(m.user, "The !generate is already disabled.") + self.send_whisper(m.user, "The !generate is already disabled.") elif m.message.startswith(("!setcooldown", "!setcd")) and self.check_if_streamer(m): split_message = m.message.split(" ") @@ -119,13 +120,13 @@ def message_handler(self, m): try: cooldown = int(split_message[1]) except ValueError: - self.ws.send_whisper(m.user, f"The parameter must be an integer amount, eg: !setcd 30") + self.send_whisper(m.user, f"The parameter must be an integer amount, eg: !setcd 30") return self.cooldown = cooldown Settings.update_cooldown(cooldown) - self.ws.send_whisper(m.user, f"The !generate cooldown has been set to {cooldown} seconds.") + self.send_whisper(m.user, f"The !generate cooldown has been set to {cooldown} seconds.") else: - self.ws.send_whisper(m.user, f"Please add exactly 1 integer parameter, eg: !setcd 30.") + self.send_whisper(m.user, f"Please add exactly 1 integer parameter, eg: !setcd 30.") if m.type == "PRIVMSG": @@ -136,7 +137,7 @@ def message_handler(self, m): if self.check_if_generate(m.message): if not self._enabled: if not self.db.check_whisper_ignore(m.user): - self.ws.send_whisper(m.user, "The !generate has been turned off. !nopm to stop me from whispering you.") + self.send_whisper(m.user, "The !generate has been turned off. !nopm to stop me from whispering you.") return cur_time = time.time() @@ -154,7 +155,7 @@ def message_handler(self, m): self.ws.send_message(sentence) else: if not self.db.check_whisper_ignore(m.user): - self.ws.send_whisper(m.user, f"Cooldown hit: {self.prev_message_t + self.cooldown - cur_time:0.2f} out of {self.cooldown:.0f}s remaining. !nopm to stop these cooldown pm's.") + self.send_whisper(m.user, f"Cooldown hit: {self.prev_message_t + self.cooldown - cur_time:0.2f} out of {self.cooldown:.0f}s remaining. !nopm to stop these cooldown pm's.") logger.info(f"Cooldown hit with {self.prev_message_t + self.cooldown - cur_time:0.2f}s remaining") return @@ -235,12 +236,12 @@ def message_handler(self, m): if m.message == "!nopm": logger.debug(f"Adding {m.user} to Do Not Whisper.") self.db.add_whisper_ignore(m.user) - self.ws.send_whisper(m.user, "You will no longer be sent whispers. Type !yespm to reenable. ") + self.send_whisper(m.user, "You will no longer be sent whispers. Type !yespm to reenable. ") elif m.message == "!yespm": logger.debug(f"Removing {m.user} from Do Not Whisper.") self.db.remove_whisper_ignore(m.user) - self.ws.send_whisper(m.user, "You will again be sent whispers. Type !nopm to disable again. ") + self.send_whisper(m.user, "You will again be sent whispers. Type !nopm to disable again. ") # Note that I add my own username to this list to allow me to manage the # blacklist in channels of my bot in channels I am not modded in. @@ -254,9 +255,9 @@ def message_handler(self, m): self.blacklist.append(word) logger.info(f"Added `{word}` to Blacklist.") self.write_blacklist(self.blacklist) - self.ws.send_whisper(m.user, "Added word to Blacklist.") + self.send_whisper(m.user, "Added word to Blacklist.") else: - self.ws.send_whisper(m.user, "Expected Format: `!blacklist word` to add `word` to the blacklist") + self.send_whisper(m.user, "Expected Format: `!blacklist word` to add `word` to the blacklist") # Removing from the blacklist elif self.check_if_our_command(m.message, "!whitelist"): @@ -266,22 +267,22 @@ def message_handler(self, m): self.blacklist.remove(word) logger.info(f"Removed `{word}` from Blacklist.") self.write_blacklist(self.blacklist) - self.ws.send_whisper(m.user, "Removed word from Blacklist.") + self.send_whisper(m.user, "Removed word from Blacklist.") except ValueError: - self.ws.send_whisper(m.user, "Word was already not in the blacklist.") + self.send_whisper(m.user, "Word was already not in the blacklist.") else: - self.ws.send_whisper(m.user, "Expected Format: `!whitelist word` to remove `word` from the blacklist.") + self.send_whisper(m.user, "Expected Format: `!whitelist word` to remove `word` from the blacklist.") # Checking whether a word is in the blacklist elif self.check_if_our_command(m.message, "!check"): if len(m.message.split()) == 2: word = m.message.split()[1].lower() if word in self.blacklist: - self.ws.send_whisper(m.user, "This word is in the Blacklist.") + self.send_whisper(m.user, "This word is in the Blacklist.") else: - self.ws.send_whisper(m.user, "This word is not in the Blacklist.") + self.send_whisper(m.user, "This word is not in the Blacklist.") else: - self.ws.send_whisper(m.user, "Expected Format: `!check word` to check whether `word` is on the blacklist.") + self.send_whisper(m.user, "Expected Format: `!check word` to check whether `word` is on the blacklist.") elif m.type == "CLEARMSG": # If a message is deleted, its contents will be unlearned @@ -416,6 +417,11 @@ def send_automatic_generation_message(self) -> None: else: logger.info("Attempted to output automatic generation message, but there is not enough learned information yet.") + def send_whisper(self, user, message): + if self.should_whisper: + self.ws.send_whisper(user, message) + return + def check_filter(self, message) -> bool: # Returns True if message contains a banned word. for word in message.translate(self.punct_trans_table).lower().split(): diff --git a/Settings.py b/Settings.py index 24867a8..80d59f5 100644 --- a/Settings.py +++ b/Settings.py @@ -66,7 +66,8 @@ def __init__(self, bot): data["KeyLength"], data["MaxSentenceWordAmount"], data["HelpMessageTimer"], - data["AutomaticGenerationTimer"]) + data["AutomaticGenerationTimer"], + data["ShouldWhisper"], except ValueError: logger.error("Error in settings file.") @@ -93,7 +94,8 @@ def write_default_settings_file(): "KeyLength": 2, "MaxSentenceWordAmount": 25, "HelpMessageTimer": 7200, - "AutomaticGenerationTimer": -1 + "AutomaticGenerationTimer": -1, + "ShouldWhisper": True, } f.write(json.dumps(standard_dict, indent=4, separators=(",", ": "))) From 5ba082e1af6b22325e1fd941333d2ee29c64403a Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:05:36 -0400 Subject: [PATCH 04/30] refactor: additional logging --- MarkovChainBot.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 4eab83c..6706a3e 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -102,17 +102,19 @@ def message_handler(self, m): elif m.type in ("PRIVMSG", "WHISPER"): if m.message.startswith("!enable") and self.check_if_streamer(m): if self._enabled: - self.send_whisper(m.user, "The !generate is already enabled.") + self.send_whisper(m.user, "The generate command is already enabled.") else: - self.send_whisper(m.user, "Users can now !generate message again.") + self.send_whisper(m.user, "Users can now use generate command again.") self._enabled = True + logger.info("Users can now use generate command again.") elif m.message.startswith("!disable") and self.check_if_streamer(m): if self._enabled: - self.send_whisper(m.user, "Users can now no longer use !generate.") + self.send_whisper(m.user, "Users can now no longer use generate command.") self._enabled = False + logger.info("Users can now no longer use generate command.") else: - self.send_whisper(m.user, "The !generate is already disabled.") + self.send_whisper(m.user, "The generate command is already disabled.") elif m.message.startswith(("!setcooldown", "!setcd")) and self.check_if_streamer(m): split_message = m.message.split(" ") From ef1962c5ba6525195f06f2218af29232a882adfe Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:08:41 -0400 Subject: [PATCH 05/30] feat: add disabling generate command while still generating at interval --- MarkovChainBot.py | 6 +++++- Settings.py | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 6706a3e..be5fd53 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -63,7 +63,7 @@ def __init__(self): live=True) self.ws.start_bot() - def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer, should_whisper): + def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer, should_whisper, enable_generate_command): self.host = host self.port = port self.chan = chan @@ -77,6 +77,7 @@ def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, co self.help_message_timer = help_message_timer self.automatic_generation_timer = automatic_generation_timer self.should_whisper = should_whisper + self.enable_generate_command = enable_generate_command def message_handler(self, m): try: @@ -137,6 +138,9 @@ def message_handler(self, m): return if self.check_if_generate(m.message): + if not self.enable_generate_command and not self.check_if_streamer(m): + return + if not self._enabled: if not self.db.check_whisper_ignore(m.user): self.send_whisper(m.user, "The !generate has been turned off. !nopm to stop me from whispering you.") diff --git a/Settings.py b/Settings.py index 80d59f5..0ee94fb 100644 --- a/Settings.py +++ b/Settings.py @@ -68,6 +68,7 @@ def __init__(self, bot): data["HelpMessageTimer"], data["AutomaticGenerationTimer"], data["ShouldWhisper"], + data["EnableGenerateCommand"]) except ValueError: logger.error("Error in settings file.") @@ -96,6 +97,7 @@ def write_default_settings_file(): "HelpMessageTimer": 7200, "AutomaticGenerationTimer": -1, "ShouldWhisper": True, + "EnableGenerateCommand": True } f.write(json.dumps(standard_dict, indent=4, separators=(",", ": "))) From 63a71f6368252e4298b40dbc9be8413ef5434942 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:08:56 -0400 Subject: [PATCH 06/30] refactor: disable HelpMessageTimer setting by default --- Settings.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Settings.py b/Settings.py index 0ee94fb..df0e3df 100644 --- a/Settings.py +++ b/Settings.py @@ -94,7 +94,7 @@ def write_default_settings_file(): "Cooldown": 20, "KeyLength": 2, "MaxSentenceWordAmount": 25, - "HelpMessageTimer": 7200, + "HelpMessageTimer": -1, "AutomaticGenerationTimer": -1, "ShouldWhisper": True, "EnableGenerateCommand": True From 37579805375ccc7660c77540570e6aa094428280 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:14:23 -0400 Subject: [PATCH 07/30] chore: add gitignore file --- .gitignore | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..3d5317a --- /dev/null +++ b/.gitignore @@ -0,0 +1,144 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# IDE +.vscode/ + +# Database files +*.db \ No newline at end of file From 8f87d895e984626878b009cfa7c6760583b5b22d Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:15:01 -0400 Subject: [PATCH 08/30] refactor: change settings file to .json extension --- .gitignore | 4 ++++ Settings.py | 6 +++--- settings.txt | 18 ------------------ 3 files changed, 7 insertions(+), 21 deletions(-) delete mode 100644 settings.txt diff --git a/.gitignore b/.gitignore index 3d5317a..5c0cd32 100644 --- a/.gitignore +++ b/.gitignore @@ -140,5 +140,9 @@ cython_debug/ # IDE .vscode/ +# Settings +# This file should be automatically generated at launch if not created already +settings.json + # Database files *.db \ No newline at end of file diff --git a/Settings.py b/Settings.py index df0e3df..eb23b9b 100644 --- a/Settings.py +++ b/Settings.py @@ -3,9 +3,9 @@ logger = logging.getLogger(__name__) class Settings: - """ Loads data from settings.txt into the bot """ + """ Loads data from settings.json into the bot """ - PATH = os.path.join(os.getcwd(), "settings.txt") + PATH = os.path.join(os.getcwd(), "settings.json") def __init__(self, bot): try: @@ -80,7 +80,7 @@ def __init__(self, bot): @staticmethod def write_default_settings_file(): - # If the file is missing, create a standardised settings.txt file + # If the file is missing, create a standardised settings.json file # With all parameters required. with open(Settings.PATH, "w") as f: standard_dict = { diff --git a/settings.txt b/settings.txt deleted file mode 100644 index 82513c6..0000000 --- a/settings.txt +++ /dev/null @@ -1,18 +0,0 @@ -{ - "Host": "irc.chat.twitch.tv", - "Port": 6667, - "Channel": "#", - "Nickname": "", - "Authentication": "oauth:", - "DeniedUsers": [ - "StreamElements", - "Nightbot", - "Moobot", - "Marbiebot" - ], - "Cooldown": 20, - "KeyLength": 2, - "MaxSentenceWordAmount": 25, - "HelpMessageTimer": 7200, - "AutomaticGenerationTimer": -1 -} \ No newline at end of file From b22d9d5e8cf61912ea53d254979cb5bcc148b50e Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 17:15:35 -0400 Subject: [PATCH 09/30] feat: strip message before tokenizing sentences --- MarkovChainBot.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index be5fd53..649d315 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -196,14 +196,14 @@ def message_handler(self, m): else: # Try to split up sentences. Requires nltk's 'punkt' resource try: - sentences = sent_tokenize(m.message) + sentences = sent_tokenize(m.message.strip()) # If 'punkt' is not downloaded, then download it, and retry except LookupError: logger.debug("Downloading required punkt resource...") import nltk nltk.download('punkt') logger.debug("Downloaded required punkt resource.") - sentences = sent_tokenize(m.message) + sentences = sent_tokenize(m.message.strip()) for sentence in sentences: # Get all seperate words From b1c0416715d6ded02a4cde2bfb267d9207287c1e Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:14:53 -0400 Subject: [PATCH 10/30] refactor: add typing to methods --- Database.py | 28 ++++++++++++++-------------- Log.py | 2 +- MarkovChainBot.py | 23 ++++++++++++----------- Settings.py | 8 +++++--- Timer.py | 3 ++- 5 files changed, 34 insertions(+), 30 deletions(-) diff --git a/Database.py b/Database.py index 1089c1b..de98583 100644 --- a/Database.py +++ b/Database.py @@ -3,7 +3,7 @@ logger = logging.getLogger(__name__) class Database: - def __init__(self, channel): + def __init__(self, channel: str): self.db_name = f"MarkovChain_{channel.replace('#', '').lower()}.db" self._execute_queue = [] @@ -135,7 +135,7 @@ def progress(status, remaining, total): # Index 0 is for "A", 1 for "B", and 26 for everything else self.word_frequency = [11.6, 4.4, 5.2, 3.1, 2.8, 4, 1.6, 4.2, 7.3, 0.5, 0.8, 2.4, 3.8, 2.2, 7.6, 4.3, 0.2, 2.8, 6.6, 15.9, 1.1, 0.8, 5.5, 0.1, 0.7, 0.1, 0.5] - def add_execute_queue(self, sql, values=None): + def add_execute_queue(self, sql: str, values = None): if values is not None: self._execute_queue.append([sql, values]) else: @@ -144,7 +144,7 @@ def add_execute_queue(self, sql, values=None): if len(self._execute_queue) > 25: self.execute_commit() - def execute_commit(self, fetch=False): + def execute_commit(self, fetch: bool = False): if self._execute_queue: with sqlite3.connect(self.db_name) as conn: cur = conn.cursor() @@ -156,7 +156,7 @@ def execute_commit(self, fetch=False): if fetch: return cur.fetchall() - def execute(self, sql, values=None, fetch=False): + def execute(self, sql: str, values = None, fetch: bool = False): with sqlite3.connect(self.db_name) as conn: cur = conn.cursor() if values is None: @@ -167,31 +167,31 @@ def execute(self, sql, values=None, fetch=False): if fetch: return cur.fetchall() - def get_suffix(self, character): + def get_suffix(self, character: str): if character.lower() in (string.ascii_lowercase): return character.upper() return "_" - def add_whisper_ignore(self, username): + def add_whisper_ignore(self, username: str): self.execute("INSERT OR IGNORE INTO WhisperIgnore(username) SELECT ?", (username,)) - def check_whisper_ignore(self, username): + def check_whisper_ignore(self, username: str): return self.execute("SELECT username FROM WhisperIgnore WHERE username = ?;", (username,), fetch=True) - def remove_whisper_ignore(self, username): + def remove_whisper_ignore(self, username: str): self.execute("DELETE FROM WhisperIgnore WHERE username = ?", (username,)) def check_equal(self, l): # Check if a list contains of items that are all identical return not l or l.count(l[0]) == len(l) - def get_next(self, index, words): + def get_next(self, index: int, words): # Get all items data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ?;", words, fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else self.pick_word(data, index) - def get_next_initial(self, index, words): + def get_next_initial(self, index: int, words): # Get all items data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ? AND word3 != '';", words, fetch=True) # Return a word picked from the data, using count as a weighting factor @@ -205,19 +205,19 @@ def get_next_single(self, index, word): return None if len(data) == 0 else [word] + [self.pick_word(data, index)] """ - def get_next_single_initial(self, index, word): + def get_next_single_initial(self, index: int, word: str): # Get all items data = self.execute(f"SELECT word2, count FROM MarkovGrammar{self.get_suffix(word[0])}{random.choices(string.ascii_uppercase + '_', weights=self.word_frequency)[0]} WHERE word1 = ? AND word2 != '';", (word,), fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data, index)] - def get_next_single_start(self, word): + def get_next_single_start(self, word: str): # Get all items data = self.execute(f"SELECT word2, count FROM MarkovStart{self.get_suffix(word[0])} WHERE word1 = ?;", (word,), fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data)] - def pick_word(self, data, index=0): + def pick_word(self, data, index: int = 0): # Pick a random starting key from a weighted list # Note that the values are weighted based on index. return random.choices(data, weights=[tup[1] * ((index+1)/15) if tup[0] == "" else tup[1] for tup in data])[0][0] @@ -251,7 +251,7 @@ def add_rule_queue(self, item): def add_start_queue(self, item): self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(item[0][0])} (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + 1 FROM MarkovStart{self.get_suffix(item[0][0])} WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', values=item + item) - def unlearn(self, message): + def unlearn(self, message: str): words = message.split(" ") tuples = [(words[i], words[i+1], words[i+2]) for i in range(0, len(words) - 2)] # Unlearn start of sentence from MarkovStart diff --git a/Log.py b/Log.py index 36e398b..8e60842 100644 --- a/Log.py +++ b/Log.py @@ -2,7 +2,7 @@ import logging.config class Log(): - def __init__(self, main_file): + def __init__(self, main_file: str): # Dynamically change size set up for name in the logger this_file = os.path.basename(main_file) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 649d315..c1b0a6a 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -1,8 +1,9 @@ +from typing import List, Tuple from Log import Log Log(__file__) -from TwitchWebsocket import TwitchWebsocket +from TwitchWebsocket import Message, TwitchWebsocket from nltk.tokenize import sent_tokenize import threading, socket, time, logging, re, string @@ -79,7 +80,7 @@ def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, co self.should_whisper = should_whisper self.enable_generate_command = enable_generate_command - def message_handler(self, m): + def message_handler(self, m: Message): try: if m.type == "366": logger.info(f"Successfully joined channel: #{m.channel}") @@ -304,7 +305,7 @@ def message_handler(self, m): except Exception as e: logger.exception(e) - def generate(self, params) -> "Tuple[str, bool]": + def generate(self, params: List[str]) -> "Tuple[str, bool]": # Check for commands or recursion, eg: !generate !generate if len(params) > 0: @@ -380,7 +381,7 @@ def extract_modifiers(self, emotes: str) -> list: pass return output - def write_blacklist(self, blacklist) -> None: + def write_blacklist(self, blacklist: List[str]) -> None: logger.debug("Writing Blacklist...") with open("blacklist.txt", "w") as f: f.write("\n".join(sorted(blacklist, key=lambda x: len(x), reverse=True))) @@ -423,12 +424,12 @@ def send_automatic_generation_message(self) -> None: else: logger.info("Attempted to output automatic generation message, but there is not enough learned information yet.") - def send_whisper(self, user, message): + def send_whisper(self, user: str, message: str): if self.should_whisper: self.ws.send_whisper(user, message) return - def check_filter(self, message) -> bool: + def check_filter(self, message: str) -> bool: # Returns True if message contains a banned word. for word in message.translate(self.punct_trans_table).lower().split(): if word in self.blacklist: @@ -439,22 +440,22 @@ def check_if_our_command(self, message: str, *commands: "Tuple[str]") -> bool: # True if the first "word" of the message is either exactly command, or in the tuple of commands return message.split()[0] in commands - def check_if_generate(self, message) -> bool: + def check_if_generate(self, message: str) -> bool: # True if the first "word" of the message is either !generate or !g. return self.check_if_our_command(message, "!generate", "!g") - def check_if_other_command(self, message) -> bool: + def check_if_other_command(self, message: str) -> bool: # Don't store commands, except /me return message.startswith(("!", "/", ".")) and not message.startswith("/me") - def check_if_streamer(self, m) -> bool: + def check_if_streamer(self, m: Message) -> bool: # True if the user is the streamer return m.user == m.channel or self.check_if_owner(m) - def check_if_owner(self, m) -> bool: + def check_if_owner(self, m: Message) -> bool: return m.user == self.bot_owner; - def check_link(self, message) -> bool: + def check_link(self, message: str) -> bool: # True if message contains a link return self.link_regex.search(message) diff --git a/Settings.py b/Settings.py index eb23b9b..78d55b5 100644 --- a/Settings.py +++ b/Settings.py @@ -1,5 +1,7 @@ - import json, os, logging + +from MarkovChainBot import MarkovChain + logger = logging.getLogger(__name__) class Settings: @@ -7,7 +9,7 @@ class Settings: PATH = os.path.join(os.getcwd(), "settings.json") - def __init__(self, bot): + def __init__(self, bot: MarkovChain): try: # Try to load the file using json. # And pass the data to the Bot class instance if this succeeds. @@ -102,7 +104,7 @@ def write_default_settings_file(): f.write(json.dumps(standard_dict, indent=4, separators=(",", ": "))) @staticmethod - def update_cooldown(cooldown): + def update_cooldown(cooldown: int): with open(Settings.PATH, "r") as f: settings = f.read() data = json.loads(settings) diff --git a/Timer.py b/Timer.py index 94646b6..d3c98c3 100644 --- a/Timer.py +++ b/Timer.py @@ -1,4 +1,5 @@ import threading, logging +from typing import Callable logger = logging.getLogger(__name__) @@ -7,7 +8,7 @@ class LoopingTimer(threading.Thread): Thread that will continuously run `target(*args, **kwargs)` every `interval` seconds, until program termination. """ - def __init__(self, interval, target, *args, **kwargs) -> None: + def __init__(self, interval: int, target: Callable[[], None], *args, **kwargs) -> None: threading.Thread.__init__(self) self.interval = interval self.target = target From 33f43ab5dd48a86dcaabacc4b4689cc56b924775 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:15:11 -0400 Subject: [PATCH 11/30] refactor: remove unused import --- MarkovChainBot.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index c1b0a6a..9920094 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -5,7 +5,7 @@ from TwitchWebsocket import Message, TwitchWebsocket from nltk.tokenize import sent_tokenize -import threading, socket, time, logging, re, string +import socket, time, logging, re, string from Settings import Settings from Database import Database From a28b9c15f693cb9743b441398332a2d8195c60ca Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:28:04 -0400 Subject: [PATCH 12/30] refactor: type settings data --- Settings.py | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/Settings.py b/Settings.py index 78d55b5..c999a44 100644 --- a/Settings.py +++ b/Settings.py @@ -1,9 +1,26 @@ import json, os, logging +from typing import List, TypedDict from MarkovChainBot import MarkovChain logger = logging.getLogger(__name__) +class SettingsData(TypedDict): + Host: str + Port: int + Channel: str + Nickname: str + Authentication: str + DeniedUsers: List[str] + BotOwner: str + Cooldown: int + KeyLength: int + MaxSentenceWordAmount: int + HelpMessageTimer: int + AutomaticGenerationTimer: int + ShouldWhisper: bool + EnableGenerateCommand: bool + class Settings: """ Loads data from settings.json into the bot """ @@ -15,7 +32,7 @@ def __init__(self, bot: MarkovChain): # And pass the data to the Bot class instance if this succeeds. with open(Settings.PATH, "r") as f: settings = f.read() - data = json.loads(settings) + data: SettingsData = json.loads(settings) # "BannedWords" is only a key in the settings in older versions. # We moved to a separate file for blacklisted words. if "BannedWords" in data: @@ -85,7 +102,7 @@ def write_default_settings_file(): # If the file is missing, create a standardised settings.json file # With all parameters required. with open(Settings.PATH, "w") as f: - standard_dict = { + standard_dict: SettingsData = { "Host": "irc.chat.twitch.tv", "Port": 6667, "Channel": "#", From 875d1c23dcda3adfcd4384d4881db1248b6075ac Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:40:24 -0400 Subject: [PATCH 13/30] fix: remove circular import --- Settings.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/Settings.py b/Settings.py index c999a44..ef8ae63 100644 --- a/Settings.py +++ b/Settings.py @@ -1,8 +1,6 @@ import json, os, logging from typing import List, TypedDict -from MarkovChainBot import MarkovChain - logger = logging.getLogger(__name__) class SettingsData(TypedDict): @@ -26,7 +24,7 @@ class Settings: PATH = os.path.join(os.getcwd(), "settings.json") - def __init__(self, bot: MarkovChain): + def __init__(self, bot): try: # Try to load the file using json. # And pass the data to the Bot class instance if this succeeds. From 8a922e4e8d1a53182f14ea7235494d94d1a4a955 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:42:41 -0400 Subject: [PATCH 14/30] refactor: pass json data to set_settings instead Cleans up the args of the method --- MarkovChainBot.py | 32 ++++++++++++++++---------------- Settings.py | 15 +-------------- 2 files changed, 17 insertions(+), 30 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 9920094..c15a02e 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -7,7 +7,7 @@ from nltk.tokenize import sent_tokenize import socket, time, logging, re, string -from Settings import Settings +from Settings import Settings, SettingsData from Database import Database from Timer import LoopingTimer @@ -64,21 +64,21 @@ def __init__(self): live=True) self.ws.start_bot() - def set_settings(self, host, port, chan, nick, auth, denied_users, bot_owner, cooldown, key_length, max_sentence_length, help_message_timer, automatic_generation_timer, should_whisper, enable_generate_command): - self.host = host - self.port = port - self.chan = chan - self.nick = nick - self.auth = auth - self.denied_users = [user.lower() for user in denied_users] + [self.nick.lower()] - self.bot_owner = bot_owner.lower() - self.cooldown = cooldown - self.key_length = key_length - self.max_sentence_length = max_sentence_length - self.help_message_timer = help_message_timer - self.automatic_generation_timer = automatic_generation_timer - self.should_whisper = should_whisper - self.enable_generate_command = enable_generate_command + def set_settings(self, data: SettingsData): + self.host = data["Host"] + self.port = data["Port"] + self.chan = data["Channel"] + self.nick = data["Nickname"] + self.auth = data["Authentication"] + self.denied_users = [user.lower() for user in data["DeniedUsers"]] + [self.nick.lower()] + self.bot_owner = data["BotOwner"].lower() + self.cooldown = data["Cooldown"] + self.key_length = data["KeyLength"] + self.max_sentence_length = data["MaxSentenceWordAmount"] + self.help_message_timer = data["HelpMessageTimer"] + self.automatic_generation_timer = data["AutomaticGenerationTimer"] + self.should_whisper = data["ShouldWhisper"] + self.enable_generate_command = data["EnableGenerateCommand"] def message_handler(self, m: Message): try: diff --git a/Settings.py b/Settings.py index ef8ae63..38d5f94 100644 --- a/Settings.py +++ b/Settings.py @@ -72,20 +72,7 @@ def __init__(self, bot): with open(Settings.PATH, "w") as f: f.write(json.dumps(data, indent=4, separators=(",", ": "))) - bot.set_settings(data["Host"], - data["Port"], - data["Channel"], - data["Nickname"], - data["Authentication"], - data["DeniedUsers"], - data["BotOwner"], - data["Cooldown"], - data["KeyLength"], - data["MaxSentenceWordAmount"], - data["HelpMessageTimer"], - data["AutomaticGenerationTimer"], - data["ShouldWhisper"], - data["EnableGenerateCommand"]) + bot.set_settings(data) except ValueError: logger.error("Error in settings file.") From b7053350409190de0d21d8060c90b32465a14542 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:43:30 -0400 Subject: [PATCH 15/30] refactor: set default help_message_timer to -1 --- MarkovChainBot.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index c15a02e..01cde11 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -24,7 +24,7 @@ def __init__(self): self.cooldown = 20 self.key_length = 2 self.max_sentence_length = 20 - self.help_message_timer = 7200 + self.help_message_timer = -1 self.automatic_generation_timer = -1 self.prev_message_t = 0 self._enabled = True From 6f0bdb216d3976f713cd79cb588b16974c1c7dce Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 22:51:32 -0400 Subject: [PATCH 16/30] feat: add additional sentence if END is below minimum sentence words --- MarkovChainBot.py | 11 +++++++++-- Settings.py | 2 ++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 01cde11..5927b6c 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -24,6 +24,7 @@ def __init__(self): self.cooldown = 20 self.key_length = 2 self.max_sentence_length = 20 + self.min_sentence_length = -1 self.help_message_timer = -1 self.automatic_generation_timer = -1 self.prev_message_t = 0 @@ -75,6 +76,7 @@ def set_settings(self, data: SettingsData): self.cooldown = data["Cooldown"] self.key_length = data["KeyLength"] self.max_sentence_length = data["MaxSentenceWordAmount"] + self.min_sentence_length = data["MinSentenceWordAmount"] self.help_message_timer = data["HelpMessageTimer"] self.automatic_generation_timer = data["AutomaticGenerationTimer"] self.should_whisper = data["ShouldWhisper"] @@ -350,9 +352,14 @@ def generate(self, params: List[str]) -> "Tuple[str, bool]": else: word = self.db.get_next(i, key) - # Return if next word is the END if word == "" or word == None: - break + if i < self.min_sentence_length: + key = self.db.get_start() + for entry in key: + sentence.append(entry) + word = self.db.get_next_initial(i, key) + else: + break # Otherwise add the word sentence.append(word) diff --git a/Settings.py b/Settings.py index 38d5f94..5291811 100644 --- a/Settings.py +++ b/Settings.py @@ -14,6 +14,7 @@ class SettingsData(TypedDict): Cooldown: int KeyLength: int MaxSentenceWordAmount: int + MinSentenceWordAmount: int HelpMessageTimer: int AutomaticGenerationTimer: int ShouldWhisper: bool @@ -98,6 +99,7 @@ def write_default_settings_file(): "Cooldown": 20, "KeyLength": 2, "MaxSentenceWordAmount": 25, + "MinSentenceWordAmount": -1, "HelpMessageTimer": -1, "AutomaticGenerationTimer": -1, "ShouldWhisper": True, From 569e853152f3a5916678ec538e99f36b32e0e138 Mon Sep 17 00:00:00 2001 From: Justin Russo Date: Sun, 13 Jun 2021 23:07:33 -0400 Subject: [PATCH 17/30] docs: add newly added settings to settings table in readme --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 275df8e..9e2da6b 100644 --- a/README.md +++ b/README.md @@ -244,9 +244,12 @@ This bot is controlled by a `settings.txt` file, which has the following structu | Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | | KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | | MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | +| MinSentenceWordAmount | The minimum number of words that can be generated. Additional sentences will begin a message is lower than this number. Prevents very small messages. -1 to disable | -1 | | HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | | AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | | BotOwner | The owner of the bot's twitch username. Gives the owner the same power as the channel owner | "TestUser" | +| ShouldWhisper | Prevents the bot from attempting to whisper users | true | +| EnableGenerateCommand | Globally enables/disables the generate command | true | _Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ From 4cfeb2d87de37ac9a25f87ef9ffee284ce91403c Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 13:15:44 +0200 Subject: [PATCH 18/30] Reintroduce support for Python 3.6 and 3.7 I love the use of TypedDict to further improve third party type checking, but support for the commonly used Python 3.6 and 3.7 has priority. --- Settings.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Settings.py b/Settings.py index 5291811..4054501 100644 --- a/Settings.py +++ b/Settings.py @@ -1,5 +1,9 @@ import json, os, logging -from typing import List, TypedDict +from typing import List +try: + from typing import TypedDict +except ImportError: + TypedDict = object logger = logging.getLogger(__name__) From 1c6d7ac68c18524aaaca5fd05de84ab81f7f3dc5 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 13:59:17 +0200 Subject: [PATCH 19/30] Automatic update of settings file from old version --- Settings.py | 128 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 80 insertions(+), 48 deletions(-) diff --git a/Settings.py b/Settings.py index 4054501..81544b2 100644 --- a/Settings.py +++ b/Settings.py @@ -29,7 +29,29 @@ class Settings: PATH = os.path.join(os.getcwd(), "settings.json") + DEFAULTS = { + "Host": "irc.chat.twitch.tv", + "Port": 6667, + "Channel": "#", + "Nickname": "", + "Authentication": "oauth:", + "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], + "BotOwner": "", + "Cooldown": 20, + "KeyLength": 2, + "MaxSentenceWordAmount": 25, + "MinSentenceWordAmount": -1, + "HelpMessageTimer": -1, + "AutomaticGenerationTimer": -1, + "ShouldWhisper": True, + "EnableGenerateCommand": True + } + def __init__(self, bot): + + # Potentially update the settings structure used to the newest version + self.update_v2() + try: # Try to load the file using json. # And pass the data to the Bot class instance if this succeeds. @@ -38,36 +60,7 @@ def __init__(self, bot): data: SettingsData = json.loads(settings) # "BannedWords" is only a key in the settings in older versions. # We moved to a separate file for blacklisted words. - if "BannedWords" in data: - logger.info("Updating Blacklist system to new version...") - try: - with open("blacklist.txt", "r+") as f: - logger.info("Moving Banned Words to the blacklist.txt file...") - # Read the data, and split by word or phrase, then add BannedWords - banned_list = f.read().split("\n") + data["BannedWords"] - # Remove duplicates and sort by length, longest to shortest - banned_list = sorted(list(set(banned_list)), key=lambda x: len(x), reverse=True) - # Clear file, and then write in the new data - f.seek(0) - f.truncate(0) - f.write("\n".join(banned_list)) - logger.info("Moved Banned Words to the blacklist.txt file.") - - except FileNotFoundError: - with open("blacklist.txt", "w") as f: - logger.info("Moving Banned Words to a new blacklist.txt file...") - # Remove duplicates and sort by length, longest to shortest - banned_list = sorted(list(set(data["BannedWords"])), key=lambda x: len(x), reverse=True) - f.write("\n".join(banned_list)) - logger.info("Moved Banned Words to a new blacklist.txt file.") - - # Remove BannedWords list from data dictionary, and then write it to the settings file - del data["BannedWords"] - - with open(Settings.PATH, "w") as f: - f.write(json.dumps(data, indent=4, separators=(",", ": "))) - - logger.info("Updated Blacklist system to new version.") + self.update_v1(data) # Automatically update the settings.txt to the new version. if "HelpMessageTimer" not in data or "AutomaticGenerationTimer" not in data: @@ -87,29 +80,68 @@ def __init__(self, bot): Settings.write_default_settings_file() raise ValueError("Please fix your settings.txt file that was just generated.") + @staticmethod + def update_v1(data: SettingsData): + """Update settings file to remove the BannedWords field, in favor for a blacklist.txt file.""" + if "BannedWords" in data: + logger.info("Updating Blacklist system to new version...") + try: + with open("blacklist.txt", "r+") as f: + logger.info("Moving Banned Words to the blacklist.txt file...") + # Read the data, and split by word or phrase, then add BannedWords + banned_list = f.read().split("\n") + data["BannedWords"] + # Remove duplicates and sort by length, longest to shortest + banned_list = sorted(list(set(banned_list)), key=lambda x: len(x), reverse=True) + # Clear file, and then write in the new data + f.seek(0) + f.truncate(0) + f.write("\n".join(banned_list)) + logger.info("Moved Banned Words to the blacklist.txt file.") + + except FileNotFoundError: + with open("blacklist.txt", "w") as f: + logger.info("Moving Banned Words to a new blacklist.txt file...") + # Remove duplicates and sort by length, longest to shortest + banned_list = sorted(list(set(data["BannedWords"])), key=lambda x: len(x), reverse=True) + f.write("\n".join(banned_list)) + logger.info("Moved Banned Words to a new blacklist.txt file.") + + # Remove BannedWords list from data dictionary, and then write it to the settings file + del data["BannedWords"] + + with open(Settings.PATH, "w") as f: + f.write(json.dumps(data, indent=4, separators=(",", ": "))) + + logger.info("Updated Blacklist system to new version.") + + @staticmethod + def update_v2(): + """Converts `settings.txt` to `settings.json`, and adds missing new fields.""" + try: + # Try to load the old settings.txt file using json. + with open("settings.txt", "r") as f: + settings = f.read() + data: SettingsData = json.loads(settings) + # Add missing fields from Settings.DEFAULT to data + corrected_data = {**Settings.DEFAULTS, **data} + + # Write the new settings file + with open(Settings.PATH, "w") as f: + f.write(json.dumps(corrected_data, indent=4, separators=(",", ": "))) + + os.remove("settings.txt") + + logger.info("Updated Settings system to new version. See \"settings.json\" for new fields, and README.md for information on these fields.") + + except FileNotFoundError: + pass + @staticmethod def write_default_settings_file(): # If the file is missing, create a standardised settings.json file # With all parameters required. with open(Settings.PATH, "w") as f: - standard_dict: SettingsData = { - "Host": "irc.chat.twitch.tv", - "Port": 6667, - "Channel": "#", - "Nickname": "", - "Authentication": "oauth:", - "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], - "BotOwner": "", - "Cooldown": 20, - "KeyLength": 2, - "MaxSentenceWordAmount": 25, - "MinSentenceWordAmount": -1, - "HelpMessageTimer": -1, - "AutomaticGenerationTimer": -1, - "ShouldWhisper": True, - "EnableGenerateCommand": True - } - f.write(json.dumps(standard_dict, indent=4, separators=(",", ": "))) + f.write(json.dumps(Settings.DEFAULTS, indent=4, separators=(",", ": "))) @staticmethod def update_cooldown(cooldown: int): From 0387afcffc5f2ab617534f6e052657d550ff3a66 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 14:57:38 +0200 Subject: [PATCH 20/30] Moved logging, renamed bot_owner, should_whisper * Sometimes still send whispers, even if disabled in settings (in case of updating settings etc.) * Rename check_if_streamer to check_if_permission --- Log.py | 31 ++++++++---------- MarkovChainBot.py | 83 ++++++++++++++++++++--------------------------- README.md | 4 +-- Settings.py | 52 ++++++++++++++--------------- 4 files changed, 77 insertions(+), 93 deletions(-) diff --git a/Log.py b/Log.py index 8e60842..3f702cb 100644 --- a/Log.py +++ b/Log.py @@ -1,27 +1,22 @@ -import logging, os, json +import logging +import os +import json import logging.config + class Log(): - def __init__(self, main_file: str): + def __init__(self, logger: logging.Logger, main_file: str): # Dynamically change size set up for name in the logger this_file = os.path.basename(main_file) - + + from Settings import Settings + # If you have a logging config like me, use it if "PYTHON_LOGGING_CONFIG" in os.environ: - logging.config.fileConfig(os.environ.get("PYTHON_LOGGING_CONFIG"), defaults={"logfilename": this_file.replace(".py", "_") + Log.get_channel() + ".log"}) + logging.config.fileConfig(os.environ.get("PYTHON_LOGGING_CONFIG"), + defaults={"logfilename": this_file.replace(".py", "_") + Settings.get_channel() + ".log"}, + disable_existing_loggers=False) else: # If you don't, use a standard config that outputs some INFO in the console - logging.basicConfig(level=logging.INFO, format=f'[%(asctime)s] [%(name)s] [%(levelname)-8s] - %(message)s') - - @staticmethod - def get_channel(): - try: - with open(os.path.join(os.getcwd(), "settings.txt"), "r") as f: - settings = f.read() - data = json.loads(settings) - return data["Channel"].replace("#", "").lower() - - except FileNotFoundError: - from Settings import Settings - Settings.write_default_settings_file() - raise ValueError("Please fix your settings.txt file that was just generated.") \ No newline at end of file + logging.basicConfig( + level=logging.INFO, format=f'[%(asctime)s] [%(name)s] [%(levelname)-8s] - %(message)s') diff --git a/MarkovChainBot.py b/MarkovChainBot.py index 5927b6c..aad3878 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -1,7 +1,5 @@ from typing import List, Tuple -from Log import Log -Log(__file__) from TwitchWebsocket import Message, TwitchWebsocket from nltk.tokenize import sent_tokenize @@ -15,18 +13,6 @@ class MarkovChain: def __init__(self): - self.host = None - self.port = None - self.chan = None - self.nick = None - self.auth = None - self.denied_users = None - self.cooldown = 20 - self.key_length = 2 - self.max_sentence_length = 20 - self.min_sentence_length = -1 - self.help_message_timer = -1 - self.automatic_generation_timer = -1 self.prev_message_t = 0 self._enabled = True # This regex should detect similar phrases as links as Twitch does @@ -72,14 +58,14 @@ def set_settings(self, data: SettingsData): self.nick = data["Nickname"] self.auth = data["Authentication"] self.denied_users = [user.lower() for user in data["DeniedUsers"]] + [self.nick.lower()] - self.bot_owner = data["BotOwner"].lower() + self.allowed_users = [user.lower() for user in data["AllowedUsers"]] self.cooldown = data["Cooldown"] self.key_length = data["KeyLength"] self.max_sentence_length = data["MaxSentenceWordAmount"] self.min_sentence_length = data["MinSentenceWordAmount"] self.help_message_timer = data["HelpMessageTimer"] self.automatic_generation_timer = data["AutomaticGenerationTimer"] - self.should_whisper = data["ShouldWhisper"] + self.whisper_cooldown = data["WhisperCooldown"] self.enable_generate_command = data["EnableGenerateCommand"] def message_handler(self, m: Message): @@ -104,35 +90,35 @@ def message_handler(self, m: Message): logger.info(m.message) elif m.type in ("PRIVMSG", "WHISPER"): - if m.message.startswith("!enable") and self.check_if_streamer(m): + if m.message.startswith("!enable") and self.check_if_permissions(m): if self._enabled: - self.send_whisper(m.user, "The generate command is already enabled.") + self.ws.send_whisper(m.user, "The generate command is already enabled.") else: - self.send_whisper(m.user, "Users can now use generate command again.") + self.ws.send_whisper(m.user, "Users can now use generate command again.") self._enabled = True logger.info("Users can now use generate command again.") - elif m.message.startswith("!disable") and self.check_if_streamer(m): + elif m.message.startswith("!disable") and self.check_if_permissions(m): if self._enabled: - self.send_whisper(m.user, "Users can now no longer use generate command.") + self.ws.send_whisper(m.user, "Users can now no longer use generate command.") self._enabled = False logger.info("Users can now no longer use generate command.") else: - self.send_whisper(m.user, "The generate command is already disabled.") + self.ws.send_whisper(m.user, "The generate command is already disabled.") - elif m.message.startswith(("!setcooldown", "!setcd")) and self.check_if_streamer(m): + elif m.message.startswith(("!setcooldown", "!setcd")) and self.check_if_permissions(m): split_message = m.message.split(" ") if len(split_message) == 2: try: cooldown = int(split_message[1]) except ValueError: - self.send_whisper(m.user, f"The parameter must be an integer amount, eg: !setcd 30") + self.ws.send_whisper(m.user, f"The parameter must be an integer amount, eg: !setcd 30") return self.cooldown = cooldown Settings.update_cooldown(cooldown) - self.send_whisper(m.user, f"The !generate cooldown has been set to {cooldown} seconds.") + self.ws.send_whisper(m.user, f"The !generate cooldown has been set to {cooldown} seconds.") else: - self.send_whisper(m.user, f"Please add exactly 1 integer parameter, eg: !setcd 30.") + self.ws.send_whisper(m.user, f"Please add exactly 1 integer parameter, eg: !setcd 30.") if m.type == "PRIVMSG": @@ -141,7 +127,7 @@ def message_handler(self, m: Message): return if self.check_if_generate(m.message): - if not self.enable_generate_command and not self.check_if_streamer(m): + if not self.enable_generate_command and not self.check_if_permissions(m): return if not self._enabled: @@ -150,7 +136,7 @@ def message_handler(self, m: Message): return cur_time = time.time() - if self.prev_message_t + self.cooldown < cur_time or self.check_if_streamer(m): + if self.prev_message_t + self.cooldown < cur_time or self.check_if_permissions(m): if self.check_filter(m.message): sentence = "You can't make me say that, you madman!" else: @@ -165,7 +151,7 @@ def message_handler(self, m: Message): else: if not self.db.check_whisper_ignore(m.user): self.send_whisper(m.user, f"Cooldown hit: {self.prev_message_t + self.cooldown - cur_time:0.2f} out of {self.cooldown:.0f}s remaining. !nopm to stop these cooldown pm's.") - logger.info(f"Cooldown hit with {self.prev_message_t + self.cooldown - cur_time:0.2f}s remaining") + logger.info(f"Cooldown hit with {self.prev_message_t + self.cooldown - cur_time:0.2f}s remaining.") return # Send help message when requested. @@ -245,17 +231,17 @@ def message_handler(self, m: Message): if m.message == "!nopm": logger.debug(f"Adding {m.user} to Do Not Whisper.") self.db.add_whisper_ignore(m.user) - self.send_whisper(m.user, "You will no longer be sent whispers. Type !yespm to reenable. ") + self.ws.send_whisper(m.user, "You will no longer be sent whispers. Type !yespm to reenable. ") elif m.message == "!yespm": logger.debug(f"Removing {m.user} from Do Not Whisper.") self.db.remove_whisper_ignore(m.user) - self.send_whisper(m.user, "You will again be sent whispers. Type !nopm to disable again. ") + self.ws.send_whisper(m.user, "You will again be sent whispers. Type !nopm to disable again. ") # Note that I add my own username to this list to allow me to manage the # blacklist in channels of my bot in channels I am not modded in. # I may modify this and add a "allowed users" field in the settings file. - elif m.user.lower() in self.mod_list + ["cubiedev"]: + elif m.user.lower() in self.mod_list + ["cubiedev"] + self.allowed_users: # Adding to the blacklist if self.check_if_our_command(m.message, "!blacklist"): if len(m.message.split()) == 2: @@ -264,9 +250,9 @@ def message_handler(self, m: Message): self.blacklist.append(word) logger.info(f"Added `{word}` to Blacklist.") self.write_blacklist(self.blacklist) - self.send_whisper(m.user, "Added word to Blacklist.") + self.ws.send_whisper(m.user, "Added word to Blacklist.") else: - self.send_whisper(m.user, "Expected Format: `!blacklist word` to add `word` to the blacklist") + self.ws.send_whisper(m.user, "Expected Format: `!blacklist word` to add `word` to the blacklist") # Removing from the blacklist elif self.check_if_our_command(m.message, "!whitelist"): @@ -276,22 +262,22 @@ def message_handler(self, m: Message): self.blacklist.remove(word) logger.info(f"Removed `{word}` from Blacklist.") self.write_blacklist(self.blacklist) - self.send_whisper(m.user, "Removed word from Blacklist.") + self.ws.send_whisper(m.user, "Removed word from Blacklist.") except ValueError: - self.send_whisper(m.user, "Word was already not in the blacklist.") + self.ws.send_whisper(m.user, "Word was already not in the blacklist.") else: - self.send_whisper(m.user, "Expected Format: `!whitelist word` to remove `word` from the blacklist.") + self.ws.send_whisper(m.user, "Expected Format: `!whitelist word` to remove `word` from the blacklist.") # Checking whether a word is in the blacklist elif self.check_if_our_command(m.message, "!check"): if len(m.message.split()) == 2: word = m.message.split()[1].lower() if word in self.blacklist: - self.send_whisper(m.user, "This word is in the Blacklist.") + self.ws.send_whisper(m.user, "This word is in the Blacklist.") else: - self.send_whisper(m.user, "This word is not in the Blacklist.") + self.ws.send_whisper(m.user, "This word is not in the Blacklist.") else: - self.send_whisper(m.user, "Expected Format: `!check word` to check whether `word` is on the blacklist.") + self.ws.send_whisper(m.user, "Expected Format: `!check word` to check whether `word` is on the blacklist.") elif m.type == "CLEARMSG": # If a message is deleted, its contents will be unlearned @@ -432,7 +418,7 @@ def send_automatic_generation_message(self) -> None: logger.info("Attempted to output automatic generation message, but there is not enough learned information yet.") def send_whisper(self, user: str, message: str): - if self.should_whisper: + if self.whisper_cooldown: self.ws.send_whisper(user, message) return @@ -455,16 +441,19 @@ def check_if_other_command(self, message: str) -> bool: # Don't store commands, except /me return message.startswith(("!", "/", ".")) and not message.startswith("/me") - def check_if_streamer(self, m: Message) -> bool: - # True if the user is the streamer - return m.user == m.channel or self.check_if_owner(m) - - def check_if_owner(self, m: Message) -> bool: - return m.user == self.bot_owner; + def check_if_permissions(self, m: Message) -> bool: + """True if the user has heightened permissions. + + E.g. permissions to bypass cooldowns, update settings, disable the bot, etc. + True for the streamer themselves, and the users set as the allowed users. + """ + return m.user == m.channel or m.user in self.allowed_users def check_link(self, message: str) -> bool: # True if message contains a link return self.link_regex.search(message) if __name__ == "__main__": + from Log import Log + Log(logger, __file__) MarkovChain() diff --git a/README.md b/README.md index 9e2da6b..e68e65a 100644 --- a/README.md +++ b/README.md @@ -247,8 +247,8 @@ This bot is controlled by a `settings.txt` file, which has the following structu | MinSentenceWordAmount | The minimum number of words that can be generated. Additional sentences will begin a message is lower than this number. Prevents very small messages. -1 to disable | -1 | | HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | | AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | -| BotOwner | The owner of the bot's twitch username. Gives the owner the same power as the channel owner | "TestUser" | -| ShouldWhisper | Prevents the bot from attempting to whisper users | true | +| AllowedUsers | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | ["CubieDev", "Limmy"] | +| WhisperCooldown | Prevents the bot from attempting to whisper users the remaining cooldown. | true | | EnableGenerateCommand | Globally enables/disables the generate command | true | _Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ diff --git a/Settings.py b/Settings.py index 81544b2..67b20e3 100644 --- a/Settings.py +++ b/Settings.py @@ -14,14 +14,14 @@ class SettingsData(TypedDict): Nickname: str Authentication: str DeniedUsers: List[str] - BotOwner: str + AllowedUsers: List[str] Cooldown: int KeyLength: int MaxSentenceWordAmount: int MinSentenceWordAmount: int HelpMessageTimer: int AutomaticGenerationTimer: int - ShouldWhisper: bool + WhisperCooldown: bool EnableGenerateCommand: bool class Settings: @@ -36,41 +36,37 @@ class Settings: "Nickname": "", "Authentication": "oauth:", "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], - "BotOwner": "", + "AllowedUsers": [], "Cooldown": 20, "KeyLength": 2, "MaxSentenceWordAmount": 25, "MinSentenceWordAmount": -1, "HelpMessageTimer": -1, "AutomaticGenerationTimer": -1, - "ShouldWhisper": True, + "WhisperCooldown": True, "EnableGenerateCommand": True } def __init__(self, bot): - + settings = Settings.read_settings() + bot.set_settings(settings) + + @staticmethod + def read_settings(): # Potentially update the settings structure used to the newest version - self.update_v2() + Settings.update_v2() try: # Try to load the file using json. # And pass the data to the Bot class instance if this succeeds. with open(Settings.PATH, "r") as f: - settings = f.read() - data: SettingsData = json.loads(settings) + text_settings = f.read() + settings: SettingsData = json.loads(text_settings) # "BannedWords" is only a key in the settings in older versions. # We moved to a separate file for blacklisted words. - self.update_v1(data) - - # Automatically update the settings.txt to the new version. - if "HelpMessageTimer" not in data or "AutomaticGenerationTimer" not in data: - data["HelpMessageTimer"] = data.get("HelpMessageTimer", 7200) # Default is once per 2 hours - data["AutomaticGenerationTimer"] = data.get("AutomaticGenerationTimer", -1) # Default is never: -1 - - with open(Settings.PATH, "w") as f: - f.write(json.dumps(data, indent=4, separators=(",", ": "))) + Settings.update_v1(settings) - bot.set_settings(data) + return settings except ValueError: logger.error("Error in settings file.") @@ -78,18 +74,18 @@ def __init__(self, bot): except FileNotFoundError: Settings.write_default_settings_file() - raise ValueError("Please fix your settings.txt file that was just generated.") - + raise ValueError("Please fix your settings file that was just generated.") + @staticmethod - def update_v1(data: SettingsData): + def update_v1(settings: SettingsData): """Update settings file to remove the BannedWords field, in favor for a blacklist.txt file.""" - if "BannedWords" in data: + if "BannedWords" in settings: logger.info("Updating Blacklist system to new version...") try: with open("blacklist.txt", "r+") as f: logger.info("Moving Banned Words to the blacklist.txt file...") # Read the data, and split by word or phrase, then add BannedWords - banned_list = f.read().split("\n") + data["BannedWords"] + banned_list = f.read().split("\n") + settings["BannedWords"] # Remove duplicates and sort by length, longest to shortest banned_list = sorted(list(set(banned_list)), key=lambda x: len(x), reverse=True) # Clear file, and then write in the new data @@ -102,15 +98,15 @@ def update_v1(data: SettingsData): with open("blacklist.txt", "w") as f: logger.info("Moving Banned Words to a new blacklist.txt file...") # Remove duplicates and sort by length, longest to shortest - banned_list = sorted(list(set(data["BannedWords"])), key=lambda x: len(x), reverse=True) + banned_list = sorted(list(set(settings["BannedWords"])), key=lambda x: len(x), reverse=True) f.write("\n".join(banned_list)) logger.info("Moved Banned Words to a new blacklist.txt file.") # Remove BannedWords list from data dictionary, and then write it to the settings file - del data["BannedWords"] + del settings["BannedWords"] with open(Settings.PATH, "w") as f: - f.write(json.dumps(data, indent=4, separators=(",", ": "))) + f.write(json.dumps(settings, indent=4, separators=(",", ": "))) logger.info("Updated Blacklist system to new version.") @@ -152,3 +148,7 @@ def update_cooldown(cooldown: int): with open(Settings.PATH, "w") as f: f.write(json.dumps(data, indent=4, separators=(",", ": "))) + @classmethod + def get_channel(cls): + settings = Settings.read_settings() + return settings["Channel"].replace("#", "").lower() \ No newline at end of file From 04b2d98012aee5ef3ae5dba011b99ab22b0f2f30 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 15:08:24 +0200 Subject: [PATCH 21/30] Set default help message timer to once every 5 hours --- README.md | 14 +++++++++----- Settings.py | 2 +- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e68e65a..23ae09b 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ Twitch Bot for generating messages based on what it learned from chat ## Explanation -When the bot has started, it will start listening to chat messages in the channel listed in the `settings.txt` file. Any chat message not sent by a denied user will be learned from. Whenever someone then requests a message to be generated, a [Markov Chain](https://en.wikipedia.org/wiki/Markov_chain) will be used with the learned data to generate a sentence. **Note that the bot is unaware of the meaning of any of its inputs and outputs. This means it can use bad language if it was taught to use bad language by people in chat. You can add a list of banned words it should never learn or say. Use at your own risk.** +When the bot has started, it will start listening to chat messages in the channel listed in the `settings.json` file. Any chat message not sent by a denied user will be learned from. Whenever someone then requests a message to be generated, a [Markov Chain](https://en.wikipedia.org/wiki/Markov_chain) will be used with the learned data to generate a sentence. **Note that the bot is unaware of the meaning of any of its inputs and outputs. This means it can use bad language if it was taught to use bad language by people in chat. You can add a list of banned words it should never learn or say. Use at your own risk.** Whenever a message is deleted from chat, it's contents will be unlearned at 5 times the rate a normal message is learned from. The bot will avoid learning from commands, or from messages containing links. @@ -215,7 +215,7 @@ And to check whether `word` is already on the blacklist or not, a moderator can ## Settings -This bot is controlled by a `settings.txt` file, which has the following structure: +This bot is controlled by a `settings.json` file, which has the following structure: ```json { @@ -225,11 +225,15 @@ This bot is controlled by a `settings.txt` file, which has the following structu "Nickname": "", "Authentication": "oauth:", "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], + "AllowedUsers": [], "Cooldown": 20, "KeyLength": 2, "MaxSentenceWordAmount": 25, - "HelpMessageTimer": 7200, - "AutomaticGenerationTimer": -1 + "MinSentenceWordAmount": -1, + "HelpMessageTimer": 18000, + "AutomaticGenerationTimer": -1, + "WhisperCooldown": true, + "EnableGenerateCommand": true } ``` @@ -245,7 +249,7 @@ This bot is controlled by a `settings.txt` file, which has the following structu | KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | | MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | | MinSentenceWordAmount | The minimum number of words that can be generated. Additional sentences will begin a message is lower than this number. Prevents very small messages. -1 to disable | -1 | -| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. | 7200 | +| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. Defaults to once every 5 hours. | 18000 | | AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | | AllowedUsers | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | ["CubieDev", "Limmy"] | | WhisperCooldown | Prevents the bot from attempting to whisper users the remaining cooldown. | true | diff --git a/Settings.py b/Settings.py index 67b20e3..2f32834 100644 --- a/Settings.py +++ b/Settings.py @@ -41,7 +41,7 @@ class Settings: "KeyLength": 2, "MaxSentenceWordAmount": 25, "MinSentenceWordAmount": -1, - "HelpMessageTimer": -1, + "HelpMessageTimer": 60 * 60 * 5, # 18000 seconds, 5 hours "AutomaticGenerationTimer": -1, "WhisperCooldown": True, "EnableGenerateCommand": True From 2948c5be4237b59b8271d931518fee6d9e7746db Mon Sep 17 00:00:00 2001 From: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Date: Mon, 21 Jun 2021 15:11:53 +0200 Subject: [PATCH 22/30] Added sample settings.json --- settings.json | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 settings.json diff --git a/settings.json b/settings.json new file mode 100644 index 0000000..588ba80 --- /dev/null +++ b/settings.json @@ -0,0 +1,17 @@ +{ + "Host": "irc.chat.twitch.tv", + "Port": 6667, + "Channel": "#", + "Nickname": "", + "Authentication": "oauth:", + "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], + "AllowedUsers": [], + "Cooldown": 20, + "KeyLength": 2, + "MaxSentenceWordAmount": 25, + "MinSentenceWordAmount": -1, + "HelpMessageTimer": 18000, + "AutomaticGenerationTimer": -1, + "WhisperCooldown": true, + "EnableGenerateCommand": true +} From 23f5e57dba6864fe6b5c75a30d2e9782eb7ef9d5 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 21:58:43 +0200 Subject: [PATCH 23/30] Remove settings.json Git will attempt to push it, even if it's in .gitignore, because the default file is already on git. So, it's probably best to remove it from git, so people don't accidentally push their authentications. --- settings.json | 17 ----------------- 1 file changed, 17 deletions(-) delete mode 100644 settings.json diff --git a/settings.json b/settings.json deleted file mode 100644 index 588ba80..0000000 --- a/settings.json +++ /dev/null @@ -1,17 +0,0 @@ -{ - "Host": "irc.chat.twitch.tv", - "Port": 6667, - "Channel": "#", - "Nickname": "", - "Authentication": "oauth:", - "DeniedUsers": ["StreamElements", "Nightbot", "Moobot", "Marbiebot"], - "AllowedUsers": [], - "Cooldown": 20, - "KeyLength": 2, - "MaxSentenceWordAmount": 25, - "MinSentenceWordAmount": -1, - "HelpMessageTimer": 18000, - "AutomaticGenerationTimer": -1, - "WhisperCooldown": true, - "EnableGenerateCommand": true -} From 602d4c3dc84dedc58f4961ab7220ffbabc2422c9 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Mon, 21 Jun 2021 22:00:37 +0200 Subject: [PATCH 24/30] Improved docstrings heavily Also improved efficiency of get_start in Database.py --- Database.py | 408 ++++++++++++++++++++++++++++++++++++++-------- Log.py | 2 +- MarkovChainBot.py | 142 ++++++++++++---- Settings.py | 63 +++++-- 4 files changed, 497 insertions(+), 118 deletions(-) diff --git a/Database.py b/Database.py index de98583..a6ce307 100644 --- a/Database.py +++ b/Database.py @@ -1,8 +1,83 @@ import sqlite3, logging, random, string +from typing import Any, List, Optional, Tuple logger = logging.getLogger(__name__) class Database: + + """ + The database created is called `MarkovChain_{channel}.db`, + and populated with 27 + 27^2 = 756 tables. Firstly, 27 tables with the structure of + "MarkovStart{char}", i.e. called: + > MarkovStartA + > MarkovStartB + > ... + > MarkovStartZ + > MarkovStart_ + These tables store the first two words of a sentence, alongside a "count" frequency. + The suffix of the table name is the first character of the first word in the entry. + + For example, from a sentence "I am the developer of this bot", "I am" is learned by creating + or updating an entry in MarkovStartI where the first word is "I", the second word is "am", + and the "count" value increments every time the sequence "I am" was learned. + + If instead we learn, "[he said hello]", then "[he said" is learned by creating or updating + an entry in MarkovStart_. + + + + Alongside the MarkovStart... tables, there are 729 tables called "MarkovGrammar{char}{char}", + i.e. called: + > MarkovGrammarAA + > MarkovGrammarAB + > ... + > MarkovGrammarAZ + > MarkovGrammarA_ + > MarkovGrammarBA + > MarkovGrammarBB + > ... + > MarkovGrammar_Z + > MarkovGrammar__ + These tables store 3-grams, alongside a "count" frequency of this 3-gram. The suffix of the + table name is the first character of the first word in the 3-gram, with the first character + of the second word in the 3-gram. + + If we revisit the example of "I am the developer of this bot", we learn the following 3-grams: + > "I am the" + > "am the developer" + > "the developer of" + > "developer of this" + > "of this bot" + > "this bot " + The 3-gram "am the developer" will be placed in MarkovGrammarAT, by creating or updating an entry + where the first word is "am", the second is "the", and the third "developer", while the "count" + frequency is incremented every time the 3-gram "am the developer" is learned. + + + + The core of the knowledge base are the MarkovGrammar tables, which can be used to create + functions that take a certain number of words as input, and then generate a new word. For example: + Given "I am", we can use the MarkovGrammarIA table to look for entries that have "I" as the first word, + and "am" as the second word. If there are multiple options, we can use the "count" frequency as + weights to pick an appropriate "next word". + + + + Important notes: + - Learning is *case sensitive*. The 3-gram "YOU ARE A" will become a different entry than "you are a". + This is most important when learning emotes, where the distinction between "Kappa" and "kappa" truly is important. + - Generating is *case insensitive*. Generating when using "YOU ARE" as the previous words to use in e.g. self.get_next() + will get the same results as generating using "you are". + + - Both learning and generating is *punctuation sensitive*. "Hello, how are" will learn and generate differently than + "Hello how are", as the first word is taken as "Hello,", which differs from "Hello". + A solution is to completely remove punctuation. Before learning, before generating, etc. + Essentially ignore that it exists. + However, this is not entirely desirable. In a perfect world, we would like to learn "hello," + and "hello" differently, just like "HELLO" and "hello", but allow generating from "hello" + to both get results from "hello" and "hello,". + """ + def __init__(self, channel: str): self.db_name = f"MarkovChain_{channel.replace('#', '').lower()}.db" self._execute_queue = [] @@ -11,6 +86,48 @@ def __init__(self, channel: str): # My ideas for such an implementation have increased the generation time by ~5x. # This was not worth it for me. I may revisit this at some point. + self.update_v1(channel) + self.update_v2() + + # Create database tables. + for first_char in list(string.ascii_uppercase) + ["_"]: + self.add_execute_queue(f""" + CREATE TABLE IF NOT EXISTS MarkovStart{first_char} ( + word1 TEXT COLLATE NOCASE, + word2 TEXT COLLATE NOCASE, + count INTEGER, + PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY) + ); + """) + for second_char in list(string.ascii_uppercase) + ["_"]: + self.add_execute_queue(f""" + CREATE TABLE IF NOT EXISTS MarkovGrammar{first_char}{second_char} ( + word1 TEXT COLLATE NOCASE, + word2 TEXT COLLATE NOCASE, + word3 TEXT COLLATE NOCASE, + count INTEGER, + PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) + ); + """) + sql = """ + CREATE TABLE IF NOT EXISTS WhisperIgnore ( + username TEXT, + PRIMARY KEY (username) + ); + """ + self.add_execute_queue(sql) + self.execute_commit() + + # Used for randomly picking a Markov Grammar if only one word is given + # Index 0 is for "A", 1 for "B", etc. Then, 26 is for "_" + self.word_frequency = [11.6, 4.4, 5.2, 3.1, 2.8, 4, 1.6, 4.2, 7.3, 0.5, 0.8, 2.4, 3.8, 2.2, 7.6, 4.3, 0.2, 2.8, 6.6, 15.9, 1.1, 0.8, 5.5, 0.1, 0.7, 0.1, 0.5] + + def update_v1(self, channel: str): + """Update the Database structure from a deprecated version to a newer one. + + Args: + channel (str): The name of the Twitch channel on which the bot is running. + """ # If an old version of the Database is used, update the database if ("MarkovGrammarA",) in self.execute("SELECT name FROM sqlite_master WHERE type='table';", fetch=True): @@ -94,6 +211,14 @@ def progress(status, remaining, total): logger.info("Finished Updating Database to new version.") + def update_v2(self): + """Update the Database structure from a deprecated version to a newer one. + + This update involves a typo. + + Args: + channel (str): The name of the Twitch channel on which the bot is running. + """ # Resolve typo in Database if self.execute("SELECT * FROM PRAGMA_TABLE_INFO('MarkovGrammarAA') WHERE name='occurances';", fetch=True): logger.info("Updating Database to new version...") @@ -103,39 +228,18 @@ def progress(status, remaining, total): self.execute(f"ALTER TABLE MarkovStart{first_char} RENAME COLUMN occurances TO count;") logger.info("Finished Updating Database to new version.") - for first_char in list(string.ascii_uppercase) + ["_"]: - self.add_execute_queue(f""" - CREATE TABLE IF NOT EXISTS MarkovStart{first_char} ( - word1 TEXT COLLATE NOCASE, - word2 TEXT COLLATE NOCASE, - count INTEGER, - PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY) - ); - """) - for second_char in list(string.ascii_uppercase) + ["_"]: - self.add_execute_queue(f""" - CREATE TABLE IF NOT EXISTS MarkovGrammar{first_char}{second_char} ( - word1 TEXT COLLATE NOCASE, - word2 TEXT COLLATE NOCASE, - word3 TEXT COLLATE NOCASE, - count INTEGER, - PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) - ); - """) - sql = """ - CREATE TABLE IF NOT EXISTS WhisperIgnore ( - username TEXT, - PRIMARY KEY (username) - ); - """ - self.add_execute_queue(sql) - self.execute_commit() + def add_execute_queue(self, sql: str, values: Tuple[Any] = None) -> None: + """Add query and corresponding values to a queue, to be executed all at once. - # Used for randomly picking a Markov Grammar if only one word is given - # Index 0 is for "A", 1 for "B", and 26 for everything else - self.word_frequency = [11.6, 4.4, 5.2, 3.1, 2.8, 4, 1.6, 4.2, 7.3, 0.5, 0.8, 2.4, 3.8, 2.2, 7.6, 4.3, 0.2, 2.8, 6.6, 15.9, 1.1, 0.8, 5.5, 0.1, 0.7, 0.1, 0.5] - - def add_execute_queue(self, sql: str, values = None): + This entire queue can be executed with `self.execute_commit`, + and the queue is automatically executed if there are more than 25 waiting queries. + + Args: + sql (str): The SQL query to add, potentially with "?" for where + a value ought to be filled in. + values ([Tuple[Any]], optional): Optional tuple of values to replace "?" in SQL queries. + Defaults to None. + """ if values is not None: self._execute_queue.append([sql, values]) else: @@ -144,7 +248,16 @@ def add_execute_queue(self, sql: str, values = None): if len(self._execute_queue) > 25: self.execute_commit() - def execute_commit(self, fetch: bool = False): + def execute_commit(self, fetch: bool = False) -> Any: + """Execute the SQL queries added to the queue with `self.add_execute_queue`. + + Args: + fetch (bool, optional): Whether to return the fetchall() of the SQL queries. + Defaults to False. + + Returns: + Any: The returned values from the SQL queries if `fetch` is true, otherwise None. + """ if self._execute_queue: with sqlite3.connect(self.db_name) as conn: cur = conn.cursor() @@ -156,7 +269,20 @@ def execute_commit(self, fetch: bool = False): if fetch: return cur.fetchall() - def execute(self, sql: str, values = None, fetch: bool = False): + def execute(self, sql: str, values: Tuple[Any] = None, fetch: bool = False): + """Execute the SQL query with the corresponding values, potentially returning a result. + + Args: + sql (str): The SQL query to add, potentially with "?" for where + a value ought to be filled in. + values ([Tuple[Any]], optional): Optional tuple of values to replace "?" in SQL queries. + Defaults to None. + fetch (bool, optional): Whether to return the fetchall() of the SQL queries. + Defaults to False. + + Returns: + Any: The returned values from the SQL queries if `fetch` is true, otherwise None. + """ with sqlite3.connect(self.db_name) as conn: cur = conn.cursor() if values is None: @@ -167,79 +293,193 @@ def execute(self, sql: str, values = None, fetch: bool = False): if fetch: return cur.fetchall() - def get_suffix(self, character: str): - if character.lower() in (string.ascii_lowercase): + def get_suffix(self, character: str) -> str: + """Transform a character into a member of string.ascii_lowercase or "_". + + Args: + character (str): The character to normalize. + + Returns: + str: The normalized character + """ + if character.lower() in string.ascii_lowercase: return character.upper() return "_" - def add_whisper_ignore(self, username: str): + def add_whisper_ignore(self, username: str) -> None: + """Add `username` to the WhisperIgnore table, indicating that they do not wish to be whispered. + + Args: + username (str): The username of the user who no longer wants to be whispered. + """ self.execute("INSERT OR IGNORE INTO WhisperIgnore(username) SELECT ?", (username,)) - def check_whisper_ignore(self, username: str): + def check_whisper_ignore(self, username: str) -> List[Tuple[str]]: + """Returns a non-empty list only if `username` is in the WhisperIgnore table. + + Otherwise, returns an empty list. Is used to ensure that a user who doesn't want to be + whispered is never whispered. + + Args: + username (str): The username of the user to check. + + Returns: + List[Tuple[str]]: Either an empty list, or [('test_user',)]. + Allows the use of `if not check_whisper_ignore(user): whisper(user)` + """ return self.execute("SELECT username FROM WhisperIgnore WHERE username = ?;", (username,), fetch=True) - def remove_whisper_ignore(self, username: str): + def remove_whisper_ignore(self, username: str) -> None: + """Remove `username` from the WhisperIgnore table, indicating that they want to be whispered again. + + Args: + username (str): The username of the user who wants to be whispered again. + """ self.execute("DELETE FROM WhisperIgnore WHERE username = ?", (username,)) - def check_equal(self, l): - # Check if a list contains of items that are all identical - return not l or l.count(l[0]) == len(l) + def check_equal(self, l: List[Any]) -> bool: + """True if `l` consists of items that are all identical + + Useful for checking if we're learning that a sequence of the same words leads to the same word, + which can cause infinite loops when generating. - def get_next(self, index: int, words): + Args: + l (List[Any]): The list of objects for which we want to check if they are all identical. + + Returns: + bool: True if `l` consists of items that are all identical + """ + return l[0] * len(l) == l + + def get_next(self, index: int, words: List[str]) -> Optional[str]: + """Generate the next word in the sentence using learned data, given the previous `key_length` words. + + `key_length` is set to 2 by default, and cannot easily be changed. + + Args: + index (int): The index of this new word in the sentence. + words (List[str]): The previous 2 words. + + Returns: + Optional[str]: The next word in the sentence, generated given the learned data. + """ # Get all items data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ?;", words, fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else self.pick_word(data, index) - def get_next_initial(self, index: int, words): + def get_next_initial(self, index: int, words) -> Optional[str]: + """Generate the next word in the sentence using learned data, given the previous `key_length` words. + + `key_length` is set to 2 by default, and cannot easily be changed. + Similar to `get_next`, with the exception that it cannot immediately generate "" + + Args: + index (int): The index of this new word in the sentence. + words (List[str]): The previous 2 words. + + Returns: + Optional[str]: The next word in the sentence, generated given the learned data. + """ # Get all items data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ? AND word3 != '';", words, fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else self.pick_word(data, index) - - """ - def get_next_single(self, index, word): - # Get all items - data = self.execute(f"SELECT word2, count FROM MarkovGrammar{self.get_suffix(word[0])} WHERE word1 = ?;", (word,), fetch=True) - # Return a word picked from the data, using count as a weighting factor - return None if len(data) == 0 else [word] + [self.pick_word(data, index)] - """ - - def get_next_single_initial(self, index: int, word: str): + + def get_next_single_initial(self, index: int, word: str) -> Optional[List[str]]: + """Generate the next word in the sentence using learned data, given the previous word. + + Randomly picks a start character for the second word by weighing all uppercase letters and "_" with their word frequency. + + Args: + index (int): The index of this new word in the sentence. + word (str): The previous word. + + Returns: + Optional[List[str]]: The previous and newly generated word in the sentence as a list, generated given the learned data. + So, the previous word is taken directly the input of this method, and the second word is generated. + """ # Get all items data = self.execute(f"SELECT word2, count FROM MarkovGrammar{self.get_suffix(word[0])}{random.choices(string.ascii_uppercase + '_', weights=self.word_frequency)[0]} WHERE word1 = ? AND word2 != '';", (word,), fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data, index)] - def get_next_single_start(self, word: str): + def get_next_single_start(self, word: str) -> Optional[List[str]]: + """Generate the second word in the sentence using learned data, given the very first word in the sentence. + + Args: + word (str): The first word in the sentence. + + Returns: + Optional[List[str]]: The first and second word in the sentence as a list, generated given the learned data. + So, the first word is taken directly the input of this method, and the second word is generated. + """ # Get all items data = self.execute(f"SELECT word2, count FROM MarkovStart{self.get_suffix(word[0])} WHERE word1 = ?;", (word,), fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data)] - def pick_word(self, data, index: int = 0): - # Pick a random starting key from a weighted list - # Note that the values are weighted based on index. - return random.choices(data, weights=[tup[1] * ((index+1)/15) if tup[0] == "" else tup[1] for tup in data])[0][0] + def pick_word(self, data: List[Tuple[str, int]], index: int = 0) -> str: + """Randomly pick a word from `data` with word frequency as the weight. + + `index` is further used to decrease the weight of the token for the first 15 words + in the sequence, and then increase the weight after the 15th index. + + Args: + data ([type]): A list of word - frequency pairs, e.g. + [('"the', 1), ('long', 1), ('well', 5), ('an', 2), ('a', 3), ('much', 1)] + index (int, optional): The index of the newly generated word in the sentence. + Used for modifying how often the token occurs. Defaults to 0. + + Returns: + str: The pseudo-randomly picked word. + """ + return random.choices(data, + weights=[tup[-1] * ((index+1)/15) if tup[0] == "" else tup[-1] for tup in data] + )[0][0] + + def get_start(self) -> List[str]: + """Get a list of two words that mark as the start of a sentence. + + This is randomly gathered from MarkovStart{character}. - def get_start(self): + Returns: + List[str]: A list of two starting words, such as ["I", "am"]. + """ # Find one character start from - character = random.choice(list(string.ascii_lowercase) + ["_"]) + character = random.choices(list(string.ascii_lowercase) + ["_"], + weights=self.word_frequency, + k=1)[0] - # Get all items + # Get all first word, second word, frequency triples, + # e.g. [("I", "am", 3), ("You", "are", 2), ...] data = self.execute(f"SELECT * FROM MarkovStart{character};", fetch=True) - # Add each item "count" times - start_list = [list(tup[:-1]) for tup in data for _ in range(tup[-1])] - # If nothing has ever been said - if len(start_list) == 0: + if len(data) == 0: return [] + + # Return a (weighted) randomly chosen 2-gram + return list(random.choices(data, + weights=[tup[-1] for tup in data], + k=1)[0][:-1]) + + def add_rule_queue(self, item: List[str]) -> None: + """Adds a rule to the queue, ready to be entered into the knowledge base, given a 3-gram `item`. + + The rules on the queue are added with `self.add_execute_queue`, + which automatically executes the queries in the queue when there are enough queries waiting. - # Pick a random starting key from this weighted list - return random.choice(start_list) + Whenever `item` consists of three identical words, e.g. ["Kappa", "Kappa", "Kappa"], then + we perform no learning. If we did, this could cause infinite recursion in generation. - def add_rule_queue(self, item): + Args: + item (List[str]): A 3-gram, e.g. ['How', 'are', 'you']. This is learned by placing this + in the MarkovGrammarHA table, where it can be seen as: + *Given ["How", "are"], then "you" is a potential output* + The frequency of this word as an output is then incremented, + allowing for weighted picking of outputs. + """ # Filter out recursive case. if self.check_equal(item): return @@ -248,22 +488,48 @@ def add_rule_queue(self, item): return self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} (word1, word2, word3, count) VALUES (?, ?, ?, coalesce((SELECT count + 1 FROM MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY AND word3 = ? COLLATE BINARY), 1))', values=item + item) - def add_start_queue(self, item): + def add_start_queue(self, item: List[str]) -> None: + """Adds a rule to the queue, ready to be entered into the knowledge base, given a 2-gram `item`. + + The rules on the queue are added with `self.add_execute_queue`, + which automatically executes the queries in the queue when there are enough queries waiting. + + Args: + item (List[str]): A 2-gram, e.g. ['How', 'are']. This is learned by placing this + in the MarkovStartH table, where it can be randomly (with frequency as weight) + picked as a start of a sentence. + """ self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(item[0][0])} (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + 1 FROM MarkovStart{self.get_suffix(item[0][0])} WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', values=item + item) - def unlearn(self, message: str): + def unlearn(self, message: str) -> None: + """Remove frequency of 3-grams from `message` from the knowledge base. + + Useful when a message is deleted - usually we want the bot to say those things less frequently. + The frequency count for each of the 3-grams is reduced by 5, i.e. the message is unlearned by 5 + times the rate that a message is learned. + + If this means the frequency for the 3-gram becomes negative, + we delete the 3-gram from the knowledge base entirely. + + Args: + message (str): The message to unlearn. + """ words = message.split(" ") + # Construct 3-grams tuples = [(words[i], words[i+1], words[i+2]) for i in range(0, len(words) - 2)] + # Unlearn start of sentence from MarkovStart if len(words) > 1: # Reduce "count" by 5 self.add_execute_queue(f'UPDATE MarkovStart{self.get_suffix(words[0][0])} SET count = count - 5 WHERE word1 = ? AND word2 = ?;', values=(words[0], words[1], )) # Delete if count is now less than 0. self.add_execute_queue(f'DELETE FROM MarkovStart{self.get_suffix(words[0][0])} WHERE word1 = ? AND word2 = ? AND count <= 0;', values=(words[0], words[1], )) + # Unlearn all 3 word sections from Grammar for (word1, word2, word3) in tuples: # Reduce "count" by 5 self.add_execute_queue(f'UPDATE MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} SET count = count - 5 WHERE word1 = ? AND word2 = ? AND word3 = ?;', values=(word1, word2, word3, )) # Delete if count is now less than 0. self.add_execute_queue(f'DELETE FROM MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} WHERE word1 = ? AND word2 = ? AND word3 = ? AND count <= 0;', values=(word1, word2, word3, )) + self.execute_commit() diff --git a/Log.py b/Log.py index 3f702cb..982887d 100644 --- a/Log.py +++ b/Log.py @@ -5,7 +5,7 @@ class Log(): - def __init__(self, logger: logging.Logger, main_file: str): + def __init__(self, main_file: str): # Dynamically change size set up for name in the logger this_file = os.path.basename(main_file) diff --git a/MarkovChainBot.py b/MarkovChainBot.py index aad3878..f3c89b8 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -9,6 +9,9 @@ from Database import Database from Timer import LoopingTimer +from Log import Log +Log(__file__) + logger = logging.getLogger(__name__) class MarkovChain: @@ -51,22 +54,27 @@ def __init__(self): live=True) self.ws.start_bot() - def set_settings(self, data: SettingsData): - self.host = data["Host"] - self.port = data["Port"] - self.chan = data["Channel"] - self.nick = data["Nickname"] - self.auth = data["Authentication"] - self.denied_users = [user.lower() for user in data["DeniedUsers"]] + [self.nick.lower()] - self.allowed_users = [user.lower() for user in data["AllowedUsers"]] - self.cooldown = data["Cooldown"] - self.key_length = data["KeyLength"] - self.max_sentence_length = data["MaxSentenceWordAmount"] - self.min_sentence_length = data["MinSentenceWordAmount"] - self.help_message_timer = data["HelpMessageTimer"] - self.automatic_generation_timer = data["AutomaticGenerationTimer"] - self.whisper_cooldown = data["WhisperCooldown"] - self.enable_generate_command = data["EnableGenerateCommand"] + def set_settings(self, settings: SettingsData): + """Fill class instance attributes based on the settings file. + + Args: + settings (SettingsData): The settings dict with information from the settings file. + """ + self.host = settings["Host"] + self.port = settings["Port"] + self.chan = settings["Channel"] + self.nick = settings["Nickname"] + self.auth = settings["Authentication"] + self.denied_users = [user.lower() for user in settings["DeniedUsers"]] + [self.nick.lower()] + self.allowed_users = [user.lower() for user in settings["AllowedUsers"]] + self.cooldown = settings["Cooldown"] + self.key_length = settings["KeyLength"] + self.max_sentence_length = settings["MaxSentenceWordAmount"] + self.min_sentence_length = settings["MinSentenceWordAmount"] + self.help_message_timer = settings["HelpMessageTimer"] + self.automatic_generation_timer = settings["AutomaticGenerationTimer"] + self.whisper_cooldown = settings["WhisperCooldown"] + self.enable_generate_command = settings["EnableGenerateCommand"] def message_handler(self, m: Message): try: @@ -293,7 +301,18 @@ def message_handler(self, m: Message): except Exception as e: logger.exception(e) - def generate(self, params: List[str]) -> "Tuple[str, bool]": + def generate(self, params: List[str] = None) -> "Tuple[str, bool]": + """Given an input sentence, generate the remainder of the sentence using the learned data. + + Args: + params (List[str]): A list of words to use as an input to use as the start of generating. + + Returns: + Tuple[str, bool]: A tuple of a sentence as the first value, and a boolean indicating + whether the generation succeeded as the second value. + """ + if params is None: + params = [] # Check for commands or recursion, eg: !generate !generate if len(params) > 0: @@ -311,6 +330,7 @@ def generate(self, params: List[str]) -> "Tuple[str, bool]": elif len(params) == 1: # First we try to find if this word was once used as the first word in a sentence: key = self.db.get_next_single_start(params[0]) + print(key) if key == None: # If this failed, we try to find the next word in the grammar as a whole key = self.db.get_next_single_initial(0, params[0]) @@ -362,7 +382,15 @@ def generate(self, params: List[str]) -> "Tuple[str, bool]": return " ".join(sentence), True - def extract_modifiers(self, emotes: str) -> list: + def extract_modifiers(self, emotes: str) -> List[str]: + """Extract emote modifiers from emotes, such as the the horizontal flip. + + Args: + emotes (str): String containing all emotes used in the message. + + Returns: + List[str]: List of strings that show modifiers, such as "_HZ" for horizontal flip. + """ output = [] try: while emotes: @@ -375,12 +403,18 @@ def extract_modifiers(self, emotes: str) -> list: return output def write_blacklist(self, blacklist: List[str]) -> None: + """Write blacklist.txt given a list of banned words. + + Args: + blacklist (List[str]): The list of banned words to write. + """ logger.debug("Writing Blacklist...") with open("blacklist.txt", "w") as f: f.write("\n".join(sorted(blacklist, key=lambda x: len(x), reverse=True))) logger.debug("Written Blacklist.") def set_blacklist(self) -> None: + """Read blacklist.txt and set `self.blacklist` to the list of banned words.""" logger.debug("Loading Blacklist...") try: with open("blacklist.txt", "r") as f: @@ -393,7 +427,7 @@ def set_blacklist(self) -> None: self.write_blacklist(self.blacklist) def send_help_message(self) -> None: - # Send a Help message to the connected chat, as long as the bot wasn't disabled + """Send a Help message to the connected chat, as long as the bot wasn't disabled.""" if self._enabled: logger.info("Help message sent.") try: @@ -402,11 +436,12 @@ def send_help_message(self) -> None: logger.warning(f"[OSError: {error}] upon sending help message. Ignoring.") def send_automatic_generation_message(self) -> None: - # Send an automatic generation message to the connected chat, - # as long as the bot wasn't disabled, just like if someone - # typed "!g" in chat. + """Send an automatic generation message to the connected chat. + + As long as the bot wasn't disabled, just like if someone typed "!g" in chat. + """ if self._enabled: - sentence, success = self.generate([]) + sentence, success = self.generate() if success: logger.info(sentence) # Try to send a message. Just log a warning on fail @@ -417,28 +452,62 @@ def send_automatic_generation_message(self) -> None: else: logger.info("Attempted to output automatic generation message, but there is not enough learned information yet.") - def send_whisper(self, user: str, message: str): + def send_whisper(self, user: str, message: str) -> None: + """Optionally send a whisper, only if "WhisperCooldown" is True. + + Args: + user (str): The user to potentially whisper. + message (str): The message to potentially whisper + """ if self.whisper_cooldown: self.ws.send_whisper(user, message) - return def check_filter(self, message: str) -> bool: - # Returns True if message contains a banned word. + """Returns True if message contains a banned word. + + Args: + message (str): The message to check. + """ for word in message.translate(self.punct_trans_table).lower().split(): if word in self.blacklist: return True return False def check_if_our_command(self, message: str, *commands: "Tuple[str]") -> bool: - # True if the first "word" of the message is either exactly command, or in the tuple of commands + """True if the first "word" of the message is in the tuple of commands + + Args: + message (str): The message to check for a command. + commands (Tuple[str]): A tuple of commands. + + Returns: + bool: True if the first word in message is one of the commands. + """ return message.split()[0] in commands def check_if_generate(self, message: str) -> bool: - # True if the first "word" of the message is either !generate or !g. + """True if the first "word" of the message is either !generate or !g. + + Args: + message (str): The message to check for !generate or !g. + + Returns: + bool: True if the first word in message is !generate or !g. + """ return self.check_if_our_command(message, "!generate", "!g") def check_if_other_command(self, message: str) -> bool: - # Don't store commands, except /me + """True if the message is any command, except /me. + + Is used to avoid learning and generating commands. + + Args: + message (str): The message to check. + + Returns: + bool: True if the message is any potential command (starts with a '!', '/' or '.') + with the exception of /me. + """ return message.startswith(("!", "/", ".")) and not message.startswith("/me") def check_if_permissions(self, m: Message) -> bool: @@ -446,14 +515,23 @@ def check_if_permissions(self, m: Message) -> bool: E.g. permissions to bypass cooldowns, update settings, disable the bot, etc. True for the streamer themselves, and the users set as the allowed users. + + Args: + m (Message): The Message object that was sent from Twitch. + Has `user` and `channel` attributes. """ return m.user == m.channel or m.user in self.allowed_users def check_link(self, message: str) -> bool: - # True if message contains a link + """True if `message` contains a link. + + Args: + message (str): The message to check for a link. + + Returns: + bool: True if the message contains a link. + """ return self.link_regex.search(message) if __name__ == "__main__": - from Log import Log - Log(logger, __file__) MarkovChain() diff --git a/Settings.py b/Settings.py index 2f32834..a8d663c 100644 --- a/Settings.py +++ b/Settings.py @@ -29,7 +29,7 @@ class Settings: PATH = os.path.join(os.getcwd(), "settings.json") - DEFAULTS = { + DEFAULTS: SettingsData = { "Host": "irc.chat.twitch.tv", "Port": 6667, "Channel": "#", @@ -47,12 +47,29 @@ class Settings: "EnableGenerateCommand": True } - def __init__(self, bot): + def __init__(self, bot) -> None: + """Initialize the MarkovChain bot instance with the contents of the settings file + + Args: + bot (MarkovChain): The MarkovChain bot instance. + """ settings = Settings.read_settings() bot.set_settings(settings) @staticmethod - def read_settings(): + def read_settings() -> dict: + """Read the settings file and return the contents as a dict. + + Updates the settings file from an old version, if needed. + + Raises: + ValueError: Whenever the settings.json file is not valid JSON. + FileNotFoundError: Whenever the settings file was not found. + Will generate a new default settings file. + + Returns: + dict: The contents of the settings.json file. + """ # Potentially update the settings structure used to the newest version Settings.update_v2() @@ -62,8 +79,6 @@ def read_settings(): with open(Settings.PATH, "r") as f: text_settings = f.read() settings: SettingsData = json.loads(text_settings) - # "BannedWords" is only a key in the settings in older versions. - # We moved to a separate file for blacklisted words. Settings.update_v1(settings) return settings @@ -77,8 +92,14 @@ def read_settings(): raise ValueError("Please fix your settings file that was just generated.") @staticmethod - def update_v1(settings: SettingsData): - """Update settings file to remove the BannedWords field, in favor for a blacklist.txt file.""" + def update_v1(settings: SettingsData) -> None: + """Update settings file to remove the BannedWords field, in favor for a blacklist.txt file. + + Args: + settings (SettingsData): [description] + """ + # "BannedWords" is only a key in the settings in older versions. + # We moved to a separate file for blacklisted words. if "BannedWords" in settings: logger.info("Updating Blacklist system to new version...") try: @@ -111,7 +132,7 @@ def update_v1(settings: SettingsData): logger.info("Updated Blacklist system to new version.") @staticmethod - def update_v2(): + def update_v2() -> None: """Converts `settings.txt` to `settings.json`, and adds missing new fields.""" try: # Try to load the old settings.txt file using json. @@ -130,25 +151,39 @@ def update_v2(): logger.info("Updated Settings system to new version. See \"settings.json\" for new fields, and README.md for information on these fields.") except FileNotFoundError: + # If settings.txt does not exist, then we're not on an old version. pass @staticmethod - def write_default_settings_file(): - # If the file is missing, create a standardised settings.json file - # With all parameters required. + def write_default_settings_file() -> None: + """Create a standardised settings file with default values.""" with open(Settings.PATH, "w") as f: f.write(json.dumps(Settings.DEFAULTS, indent=4, separators=(",", ": "))) @staticmethod - def update_cooldown(cooldown: int): + def update_cooldown(cooldown: int) -> None: + """Update the "Cooldown" value in the settings file. + + Args: + cooldown (int): The integer representing the amount of seconds of cooldown + between outputted generations. + """ with open(Settings.PATH, "r") as f: settings = f.read() data = json.loads(settings) + data["Cooldown"] = cooldown + with open(Settings.PATH, "w") as f: f.write(json.dumps(data, indent=4, separators=(",", ": "))) - @classmethod - def get_channel(cls): + @staticmethod + def get_channel() -> str: + """Get the "Channel" value from the settings file. + + Returns: + str: The name of the Channel described in the settings file. + Stripped of "#" and converted to lowercase. + """ settings = Settings.read_settings() return settings["Channel"].replace("#", "").lower() \ No newline at end of file From cf14dda39f56fd61e550a688c22cb0354c3ad7c5 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Tue, 22 Jun 2021 22:02:28 +0200 Subject: [PATCH 25/30] Large rework of how punctuation is handled - Punctuation (commas, dots, apostrophes, etc.) are now split as a separate word. - Update the database accordingly. Create a ..._backup.db file with the old database. - Automatically update the settings.json file - Further improved code quality - Add a Tokenizer file that handles splitting up a sentence into tokens, and merging tokens back into sentences. - Added a SentenceSeparator value, which is placed inbetween sentences, when multiple sentences are generated (only when the first sentence was too short according to MinSentenceWordAmount) --- Database.py | 401 +++++++++++++++++++++++++++++++++++++++------- Log.py | 4 +- MarkovChainBot.py | 75 ++++++--- Settings.py | 4 +- Tokenizer.py | 120 ++++++++++++++ 5 files changed, 518 insertions(+), 86 deletions(-) create mode 100644 Tokenizer.py diff --git a/Database.py b/Database.py index a6ce307..4090fec 100644 --- a/Database.py +++ b/Database.py @@ -1,8 +1,12 @@ -import sqlite3, logging, random, string +import sqlite3 +import logging +import random +import string from typing import Any, List, Optional, Tuple logger = logging.getLogger(__name__) + class Database: """ @@ -68,7 +72,11 @@ class Database: This is most important when learning emotes, where the distinction between "Kappa" and "kappa" truly is important. - Generating is *case insensitive*. Generating when using "YOU ARE" as the previous words to use in e.g. self.get_next() will get the same results as generating using "you are". - + + - Learning and generating is *punctuation insensitive*. Each sentence is tokenized to split commas, dots, apostrophes, etc. + As a result, the sentence "Hello, I'm Tom!" is tokenized to: ["Hello", ",", "I", "'m", "Tom", "!"]. Then, 3-grams of this + is learned. + - Both learning and generating is *punctuation sensitive*. "Hello, how are" will learn and generate differently than "Hello how are", as the first word is taken as "Hello,", which differs from "Hello". A solution is to completely remove punctuation. Before learning, before generating, etc. @@ -81,13 +89,11 @@ class Database: def __init__(self, channel: str): self.db_name = f"MarkovChain_{channel.replace('#', '').lower()}.db" self._execute_queue = [] - - # TODO: Punctuation insensitivity. - # My ideas for such an implementation have increased the generation time by ~5x. - # This was not worth it for me. I may revisit this at some point. + # Ensure the database is updated to the newest version self.update_v1(channel) self.update_v2() + self.update_v3(channel) # Create database tables. for first_char in list(string.ascii_uppercase) + ["_"]: @@ -98,7 +104,7 @@ def __init__(self, channel: str): count INTEGER, PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY) ); - """) + """, auto_commit=False) for second_char in list(string.ascii_uppercase) + ["_"]: self.add_execute_queue(f""" CREATE TABLE IF NOT EXISTS MarkovGrammar{first_char}{second_char} ( @@ -108,7 +114,7 @@ def __init__(self, channel: str): count INTEGER, PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) ); - """) + """, auto_commit=False) sql = """ CREATE TABLE IF NOT EXISTS WhisperIgnore ( username TEXT, @@ -120,8 +126,9 @@ def __init__(self, channel: str): # Used for randomly picking a Markov Grammar if only one word is given # Index 0 is for "A", 1 for "B", etc. Then, 26 is for "_" - self.word_frequency = [11.6, 4.4, 5.2, 3.1, 2.8, 4, 1.6, 4.2, 7.3, 0.5, 0.8, 2.4, 3.8, 2.2, 7.6, 4.3, 0.2, 2.8, 6.6, 15.9, 1.1, 0.8, 5.5, 0.1, 0.7, 0.1, 0.5] - + self.word_frequency = [11.6, 4.4, 5.2, 3.1, 2.8, 4, 1.6, 4.2, 7.3, 0.5, 0.8, 2.4, + 3.8, 2.2, 7.6, 4.3, 0.2, 2.8, 6.6, 15.9, 1.1, 0.8, 5.5, 0.1, 0.7, 0.1, 0.5] + def update_v1(self, channel: str): """Update the Database structure from a deprecated version to a newer one. @@ -130,20 +137,24 @@ def update_v1(self, channel: str): """ # If an old version of the Database is used, update the database if ("MarkovGrammarA",) in self.execute("SELECT name FROM sqlite_master WHERE type='table';", fetch=True): - + logger.info("Creating backup before updating Database...") # Connect to both the new and backup, backup, and close both + def progress(status, remaining, total): logging.debug(f'Copied {total-remaining} of {total} pages...') - conn = sqlite3.connect(f"MarkovChain_{channel.replace('#', '').lower()}.db") - back_conn = sqlite3.connect(f"MarkovChain_{channel.replace('#', '').lower()}_backup.db") + conn = sqlite3.connect( + f"MarkovChain_{channel.replace('#', '').lower()}.db") + back_conn = sqlite3.connect( + f"MarkovChain_{channel.replace('#', '').lower()}_backup.db") with back_conn: conn.backup(back_conn, pages=1000, progress=progress) conn.close() back_conn.close() logger.info("Created backup before updating Database...") - - logger.info("Updating Database to new version for improved efficiency...") + + logger.info( + "Updating Database to new version for improved efficiency...") # Rename ...Other to ..._ self.add_execute_queue(f""" @@ -166,18 +177,22 @@ def progress(status, remaining, total): self.execute_commit() # Copy data from Other to _ and remove Other - self.add_execute_queue("INSERT INTO MarkovGrammar_ SELECT * FROM MarkovGrammarOther;") - self.add_execute_queue("INSERT INTO MarkovStart_ SELECT * FROM MarkovStartOther;") + self.add_execute_queue( + "INSERT INTO MarkovGrammar_ SELECT * FROM MarkovGrammarOther;") + self.add_execute_queue( + "INSERT INTO MarkovStart_ SELECT * FROM MarkovStartOther;") self.add_execute_queue("DROP TABLE MarkovGrammarOther") self.add_execute_queue("DROP TABLE MarkovStartOther") self.execute_commit() - # Copy all data from MarkovGrammarx where x is some digit to MarkovGrammar_, + # Copy all data from MarkovGrammarx where x is some digit to MarkovGrammar_, # Same with MarkovStart. for character in (list(string.digits)): - self.add_execute_queue(f"INSERT INTO MarkovGrammar_ SELECT * FROM MarkovGrammar{character}") + self.add_execute_queue( + f"INSERT INTO MarkovGrammar_ SELECT * FROM MarkovGrammar{character}") self.add_execute_queue(f"DROP TABLE MarkovGrammar{character}") - self.add_execute_queue(f"INSERT INTO MarkovStart_ SELECT * FROM MarkovStart{character}") + self.add_execute_queue( + f"INSERT INTO MarkovStart_ SELECT * FROM MarkovStart{character}") self.add_execute_queue(f"DROP TABLE MarkovStart{character}") self.execute_commit() @@ -193,9 +208,11 @@ def progress(status, remaining, total): PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) ); """) - self.add_execute_queue(f"INSERT INTO MarkovGrammar{first_char}{second_char} SELECT * FROM MarkovGrammar{first_char} WHERE word2 LIKE \"{second_char}%\";") - self.add_execute_queue(f"DELETE FROM MarkovGrammar{first_char} WHERE word2 LIKE \"{second_char}%\";") - + self.add_execute_queue( + f"INSERT INTO MarkovGrammar{first_char}{second_char} SELECT * FROM MarkovGrammar{first_char} WHERE word2 LIKE \"{second_char}%\";") + self.add_execute_queue( + f"DELETE FROM MarkovGrammar{first_char} WHERE word2 LIKE \"{second_char}%\";") + self.add_execute_queue(f""" CREATE TABLE IF NOT EXISTS MarkovGrammar{first_char}_ ( word1 TEXT COLLATE NOCASE, @@ -205,10 +222,11 @@ def progress(status, remaining, total): PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) ); """) - self.add_execute_queue(f"INSERT INTO MarkovGrammar{first_char}_ SELECT * FROM MarkovGrammar{first_char};") + self.add_execute_queue( + f"INSERT INTO MarkovGrammar{first_char}_ SELECT * FROM MarkovGrammar{first_char};") self.add_execute_queue(f"DROP TABLE MarkovGrammar{first_char}") self.execute_commit() - + logger.info("Finished Updating Database to new version.") def update_v2(self): @@ -224,11 +242,201 @@ def update_v2(self): logger.info("Updating Database to new version...") for first_char in list(string.ascii_uppercase) + ["_"]: for second_char in list(string.ascii_uppercase) + ["_"]: - self.execute(f"ALTER TABLE MarkovGrammar{first_char}{second_char} RENAME COLUMN occurances TO count;") - self.execute(f"ALTER TABLE MarkovStart{first_char} RENAME COLUMN occurances TO count;") + self.execute( + f"ALTER TABLE MarkovGrammar{first_char}{second_char} RENAME COLUMN occurances TO count;") + self.execute( + f"ALTER TABLE MarkovStart{first_char} RENAME COLUMN occurances TO count;") logger.info("Finished Updating Database to new version.") - def add_execute_queue(self, sql: str, values: Tuple[Any] = None) -> None: + def update_v3(self, channel: str) -> None: + """Update the Database structure to mark punctuation as a separate word. + + Previously, "Hello," was a valid single word. Now, it would be split as "Hello" and ",". + This allows people to generate "!g hello", and have the bot generate "hello, how are you?", + or have "!g it" result in "it's a wonderful day". + + This first copies `MarkovChain_{channel}.db` to `MarkovChain_{channel}_modified.db`. + This new copy is then modified. The original is never changed, to avoid issues when the + update is interrupted. As a result, running the program again will just re-attempt the + update. + + Upon completing the update, the original database is renamed to + `MarkovChain_{channel}_backup.db`, while the newly modified `MarkovChain_{channel}_modified.db` + is renamed to `MarkovChain_{channel}.db`. + + *This `MarkovChain_{channel}_backup.db` file can safely be deleted, as it is NOT used* + + This function also adds a `Version` table, and sets the version to 3. + + Args: + channel (str): The name of the Twitch channel on which the bot is running. + """ + + # Get Database version. Throws OperationalError if the Version table does not exist, + # in which case we definitely want to upgrade. + try: + version = self.execute( + "SELECT version FROM Version ORDER BY version DESC LIMIT 1;", fetch=True) + except sqlite3.OperationalError: + version = [] + + # Whether to upgrade + if not version or version[0][0] < 3: + logger.info( + "Updating Database to new version - supports better punctuation handling.") + + from shutil import copyfile + import os + from Tokenizer import tokenize + from nltk import ngrams + channel = channel.replace('#', '').lower() + copyfile(f"MarkovChain_{channel}.db", + f"MarkovChain_{channel}_modified.db") + logger.info( + f"Created a copy of the database called \"MarkovChain_{channel}_modified.db\". The update will modify this file.") + + # Temporarily set self.db_name to the modified one + self.db_name = f"MarkovChain_{channel.replace('#', '').lower()}_modified.db" + + # Create database tables. + for first_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovStart{first_char}" + self.add_execute_queue(f""" + CREATE TABLE IF NOT EXISTS {table}_modified ( + word1 TEXT COLLATE NOCASE, + word2 TEXT COLLATE NOCASE, + count INTEGER, + PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY) + ); + """, auto_commit=False) + for second_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovGrammar{first_char}{second_char}" + self.add_execute_queue(f""" + CREATE TABLE IF NOT EXISTS {table}_modified ( + word1 TEXT COLLATE NOCASE, + word2 TEXT COLLATE NOCASE, + word3 TEXT COLLATE NOCASE, + count INTEGER, + PRIMARY KEY (word1 COLLATE BINARY, word2 COLLATE BINARY, word3 COLLATE BINARY) + ); + """, auto_commit=False) + self.execute_commit() + + def modify_start(table: str) -> None: + """Read all data from `table`, re-tokenize it, distribute the new first 2 tokens to _modified tables, and drop `table`. + + Args: + table (str): The name of the table to work on. + """ + data = self.execute(f"SELECT * FROM {table};", fetch=True) + for tup in data: + # Remove "count" from tup for now + count = tup[-1] + tup = tup[:-1] + + raw_string = " ".join(tup) + tokenized = tokenize(raw_string) + two_gram = tokenized[:2] + # if "you're" in raw_string: + # import pdb; pdb.set_trace() + if len(two_gram) == 1: + import pdb + pdb.set_trace() + self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(two_gram[0][0])}_modified (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + {count} FROM MarkovStart{self.get_suffix(two_gram[0][0])}_modified WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', + values=two_gram + two_gram, + auto_commit=False) + + self.execute(f"DROP TABLE {table};") + + def modify_grammar(table: str) -> None: + """Read all data from `table`, re-tokenize it, distribute the new 3-grams to _modified tables, and drop `table`. + + Args: + table (str): The name of the table to work on. + """ + data = self.execute(f"SELECT * FROM {table};", fetch=True) + for tup in data: + # Remove "count" from tup for now + count = tup[-1] + tup = tup[:-1] + + # If ends on "", ignore that in in the tuple, as we don't want it to get + # tokenized. + end = False + if tup[-1] == "": + end = True + tup = tup[:-1] + + raw_string = " ".join(tup) + tokenized = tokenize(raw_string) + + # Re-add "" + if end: + tokenized.append("") + + for ngram in ngrams(tokenized, 3): + # Filter out recursive case. + if self.check_equal(ngram): + continue + self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified (word1, word2, word3, count) VALUES (?, ?, ?, coalesce((SELECT count + {count} FROM MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY AND word3 = ? COLLATE BINARY), 1))', + values=ngram + ngram, + auto_commit=False) + + self.execute(f"DROP TABLE {table};") + + # Modify all tables + i = 0 + total = 27 * 27 + 27 # The number of tables to convert + for first_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovStart{first_char}" + modify_start(table) + i += 1 + for second_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovGrammar{first_char}{second_char}" + modify_grammar(table) + i += 1 + logger.debug( + f"[{i / total * 100:.2f}%] Scheduled updates for the tables for words starting in {first_char}.") + logger.info("Starting executing table update...") + self.execute_commit() + logger.info("Finished executing table update.") + + # Rename the _modified tables to normal tables again + for first_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovStart{first_char}" + self.add_execute_queue( + f"ALTER TABLE {table}_modified RENAME TO {table};", auto_commit=False) + for second_char in list(string.ascii_uppercase) + ["_"]: + table = f"MarkovGrammar{first_char}{second_char}" + self.add_execute_queue( + f"ALTER TABLE {table}_modified RENAME TO {table};", auto_commit=False) + self.execute_commit() + + # Turn the non-modified, old version of the Database into a "_backup.db" file, + # and turn the modified file into the new main file. + os.rename(f"MarkovChain_{channel}.db", + f"MarkovChain_{channel}_backup.db") + os.rename(f"MarkovChain_{channel}_modified.db", + f"MarkovChain_{channel}.db") + + # Revert to using .db instead of _modified.db + self.db_name = f"MarkovChain_{channel.replace('#', '').lower()}.db" + + # Add a version entry + self.execute("""CREATE TABLE IF NOT EXISTS Version ( + version INTEGER + );""") + self.execute("DELETE FROM Version;") + self.execute("INSERT INTO Version (version) VALUES (3);") + + logger.info( + f"Renamed original database file \"MarkovChain_{channel}.db\" to \"MarkovChain_{channel}_backup.db\". This file is *not* used, and can safely be deleted.") + logger.info( + f"Renamed updated database file \"MarkovChain_{channel}_modified.db\" to \"MarkovChain_{channel}.db\".") + logger.info( + f"This updated \"MarkovChain_{channel}.db\" will be used to drive the Twitch bot.") + + def add_execute_queue(self, sql: str, values: Tuple[Any] = None, auto_commit: bool = True) -> None: """Add query and corresponding values to a queue, to be executed all at once. This entire queue can be executed with `self.execute_commit`, @@ -245,9 +453,9 @@ def add_execute_queue(self, sql: str, values: Tuple[Any] = None) -> None: else: self._execute_queue.append([sql]) # Commit these executes if there are more than 25 queries - if len(self._execute_queue) > 25: + if auto_commit and len(self._execute_queue) > 25: self.execute_commit() - + def execute_commit(self, fetch: bool = False) -> Any: """Execute the SQL queries added to the queue with `self.add_execute_queue`. @@ -292,7 +500,7 @@ def execute(self, sql: str, values: Tuple[Any] = None, fetch: bool = False): conn.commit() if fetch: return cur.fetchall() - + def get_suffix(self, character: str) -> str: """Transform a character into a member of string.ascii_lowercase or "_". @@ -312,8 +520,12 @@ def add_whisper_ignore(self, username: str) -> None: Args: username (str): The username of the user who no longer wants to be whispered. """ - self.execute("INSERT OR IGNORE INTO WhisperIgnore(username) SELECT ?", (username,)) - + self.execute(""" + INSERT OR IGNORE INTO WhisperIgnore(username) + SELECT ?;""", + values=(username,) + ) + def check_whisper_ignore(self, username: str) -> List[Tuple[str]]: """Returns a non-empty list only if `username` is in the WhisperIgnore table. @@ -327,15 +539,22 @@ def check_whisper_ignore(self, username: str) -> List[Tuple[str]]: List[Tuple[str]]: Either an empty list, or [('test_user',)]. Allows the use of `if not check_whisper_ignore(user): whisper(user)` """ - return self.execute("SELECT username FROM WhisperIgnore WHERE username = ?;", (username,), fetch=True) + return self.execute(""" + SELECT username FROM WhisperIgnore + WHERE username = ?;""", + values=(username,), + fetch=True) - def remove_whisper_ignore(self, username: str) -> None: + def remove_whisper_ignore(self, username: str) -> None: """Remove `username` from the WhisperIgnore table, indicating that they want to be whispered again. Args: username (str): The username of the user who wants to be whispered again. """ - self.execute("DELETE FROM WhisperIgnore WHERE username = ?", (username,)) + self.execute(""" + DELETE FROM WhisperIgnore + WHERE username = ?;""", + values=(username,)) def check_equal(self, l: List[Any]) -> bool: """True if `l` consists of items that are all identical @@ -364,7 +583,11 @@ def get_next(self, index: int, words: List[str]) -> Optional[str]: Optional[str]: The next word in the sentence, generated given the learned data. """ # Get all items - data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ?;", words, fetch=True) + data = self.execute(f""" + SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} + WHERE word1 = ? AND word2 = ?;""", + values=words, + fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else self.pick_word(data, index) @@ -382,7 +605,11 @@ def get_next_initial(self, index: int, words) -> Optional[str]: Optional[str]: The next word in the sentence, generated given the learned data. """ # Get all items - data = self.execute(f"SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} WHERE word1 = ? AND word2 = ? AND word3 != '';", words, fetch=True) + data = self.execute(f""" + SELECT word3, count FROM MarkovGrammar{self.get_suffix(words[0][0])}{self.get_suffix(words[1][0])} + WHERE word1 = ? AND word2 = ? AND word3 != '';""", + values=words, + fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else self.pick_word(data, index) @@ -399,8 +626,15 @@ def get_next_single_initial(self, index: int, word: str) -> Optional[List[str]]: Optional[List[str]]: The previous and newly generated word in the sentence as a list, generated given the learned data. So, the previous word is taken directly the input of this method, and the second word is generated. """ + # Randomly pick first character for the second word + char_two = random.choices(string.ascii_uppercase + '_', + weights=self.word_frequency)[0] # Get all items - data = self.execute(f"SELECT word2, count FROM MarkovGrammar{self.get_suffix(word[0])}{random.choices(string.ascii_uppercase + '_', weights=self.word_frequency)[0]} WHERE word1 = ? AND word2 != '';", (word,), fetch=True) + data = self.execute(f""" + SELECT word2, count FROM MarkovGrammar{self.get_suffix(word[0])}{char_two} + WHERE word1 = ? AND word2 != '';""", + values=(word,), + fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data, index)] @@ -415,7 +649,11 @@ def get_next_single_start(self, word: str) -> Optional[List[str]]: So, the first word is taken directly the input of this method, and the second word is generated. """ # Get all items - data = self.execute(f"SELECT word2, count FROM MarkovStart{self.get_suffix(word[0])} WHERE word1 = ?;", (word,), fetch=True) + data = self.execute(f""" + SELECT word2, count FROM MarkovStart{self.get_suffix(word[0])} + WHERE word1 = ?;""", + values=(word,), + fetch=True) # Return a word picked from the data, using count as a weighting factor return None if len(data) == 0 else [word] + [self.pick_word(data)] @@ -434,9 +672,14 @@ def pick_word(self, data: List[Tuple[str, int]], index: int = 0) -> str: Returns: str: The pseudo-randomly picked word. """ - return random.choices(data, - weights=[tup[-1] * ((index+1)/15) if tup[0] == "" else tup[-1] for tup in data] - )[0][0] + return random.choices(data, + weights=[ + tup[-1] * ((index+1)/15) + if tup[0] == "" else + tup[-1] + for tup in data + ] + )[0][0] def get_start(self) -> List[str]: """Get a list of two words that mark as the start of a sentence. @@ -447,20 +690,22 @@ def get_start(self) -> List[str]: List[str]: A list of two starting words, such as ["I", "am"]. """ # Find one character start from - character = random.choices(list(string.ascii_lowercase) + ["_"], + character = random.choices(list(string.ascii_lowercase) + ["_"], weights=self.word_frequency, k=1)[0] - # Get all first word, second word, frequency triples, + # Get all first word, second word, frequency triples, # e.g. [("I", "am", 3), ("You", "are", 2), ...] - data = self.execute(f"SELECT * FROM MarkovStart{character};", fetch=True) - + data = self.execute( + f"SELECT * FROM MarkovStart{character};", + fetch=True) + # If nothing has ever been said if len(data) == 0: return [] - + # Return a (weighted) randomly chosen 2-gram - return list(random.choices(data, + return list(random.choices(data, weights=[tup[-1] for tup in data], k=1)[0][:-1]) @@ -483,11 +728,21 @@ def add_rule_queue(self, item: List[str]) -> None: # Filter out recursive case. if self.check_equal(item): return - if "" in item: # prevent adding invalid rules. Ideally this wouldn't trigger, but it seems to happen rarely. - logger.warning(f"Failed to add item to rules. Item contains empty string: {item!r}") + if "" in item: # prevent adding invalid rules. Ideally this wouldn't trigger, but it seems to happen rarely. + logger.warning( + f"Failed to add item to rules. Item contains empty string: {item!r}") return - self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} (word1, word2, word3, count) VALUES (?, ?, ?, coalesce((SELECT count + 1 FROM MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY AND word3 = ? COLLATE BINARY), 1))', values=item + item) - + self.add_execute_queue(f''' + INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} (word1, word2, word3, count) + VALUES (?, ?, ?, coalesce( + ( + SELECT count + 1 FROM MarkovGrammar{self.get_suffix(item[0][0])}{self.get_suffix(item[1][0])} + WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY AND word3 = ? COLLATE BINARY + ), + 1) + )''', + values=item + item) + def add_start_queue(self, item: List[str]) -> None: """Adds a rule to the queue, ready to be entered into the knowledge base, given a 2-gram `item`. @@ -499,8 +754,17 @@ def add_start_queue(self, item: List[str]) -> None: in the MarkovStartH table, where it can be randomly (with frequency as weight) picked as a start of a sentence. """ - self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(item[0][0])} (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + 1 FROM MarkovStart{self.get_suffix(item[0][0])} WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', values=item + item) - + self.add_execute_queue(f''' + INSERT OR REPLACE INTO MarkovStart{self.get_suffix(item[0][0])} (word1, word2, count) + VALUES (?, ?, coalesce( + ( + SELECT count + 1 FROM MarkovStart{self.get_suffix(item[0][0])} + WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY + ), + 1) + )''', + values=item + item) + def unlearn(self, message: str) -> None: """Remove frequency of 3-grams from `message` from the knowledge base. @@ -516,20 +780,35 @@ def unlearn(self, message: str) -> None: """ words = message.split(" ") # Construct 3-grams - tuples = [(words[i], words[i+1], words[i+2]) for i in range(0, len(words) - 2)] + tuples = [(words[i], words[i+1], words[i+2]) + for i in range(0, len(words) - 2)] # Unlearn start of sentence from MarkovStart if len(words) > 1: # Reduce "count" by 5 - self.add_execute_queue(f'UPDATE MarkovStart{self.get_suffix(words[0][0])} SET count = count - 5 WHERE word1 = ? AND word2 = ?;', values=(words[0], words[1], )) + self.add_execute_queue(f''' + UPDATE MarkovStart{self.get_suffix(words[0][0])} + SET count = count - 5 + WHERE word1 = ? AND word2 = ?;''', + values=(words[0], words[1],)) # Delete if count is now less than 0. - self.add_execute_queue(f'DELETE FROM MarkovStart{self.get_suffix(words[0][0])} WHERE word1 = ? AND word2 = ? AND count <= 0;', values=(words[0], words[1], )) - + self.add_execute_queue(f''' + DELETE FROM MarkovStart{self.get_suffix(words[0][0])} + WHERE word1 = ? AND word2 = ? AND count <= 0;''', + values=(words[0], words[1],)) + # Unlearn all 3 word sections from Grammar for (word1, word2, word3) in tuples: # Reduce "count" by 5 - self.add_execute_queue(f'UPDATE MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} SET count = count - 5 WHERE word1 = ? AND word2 = ? AND word3 = ?;', values=(word1, word2, word3, )) + self.add_execute_queue(f''' + UPDATE MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} + SET count = count - 5 + WHERE word1 = ? AND word2 = ? AND word3 = ?;''', + values=(word1, word2, word3,)) # Delete if count is now less than 0. - self.add_execute_queue(f'DELETE FROM MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} WHERE word1 = ? AND word2 = ? AND word3 = ? AND count <= 0;', values=(word1, word2, word3, )) + self.add_execute_queue(f''' + DELETE FROM MarkovGrammar{self.get_suffix(word1[0])}{self.get_suffix(word2[0])} + WHERE word1 = ? AND word2 = ? AND word3 = ? AND count <= 0;''', + values=(word1, word2, word3, )) self.execute_commit() diff --git a/Log.py b/Log.py index 982887d..e4b3fdf 100644 --- a/Log.py +++ b/Log.py @@ -18,5 +18,5 @@ def __init__(self, main_file: str): disable_existing_loggers=False) else: # If you don't, use a standard config that outputs some INFO in the console - logging.basicConfig( - level=logging.INFO, format=f'[%(asctime)s] [%(name)s] [%(levelname)-8s] - %(message)s') + logging.basicConfig(level=logging.INFO, + format=f'[%(asctime)s] [%(name)s] [%(levelname)-8s] - %(message)s') diff --git a/MarkovChainBot.py b/MarkovChainBot.py index f3c89b8..171f95c 100644 --- a/MarkovChainBot.py +++ b/MarkovChainBot.py @@ -8,6 +8,7 @@ from Settings import Settings, SettingsData from Database import Database from Timer import LoopingTimer +from Tokenizer import detokenize, tokenize from Log import Log Log(__file__) @@ -20,8 +21,6 @@ def __init__(self): self._enabled = True # This regex should detect similar phrases as links as Twitch does self.link_regex = re.compile("\w+\.[a-z]{2,}") - # Make a translation table for removing punctuation efficiently - self.punct_trans_table = str.maketrans("", "", string.punctuation) # List of moderators used in blacklist modification, includes broadcaster self.mod_list = [] self.set_blacklist() @@ -75,6 +74,7 @@ def set_settings(self, settings: SettingsData): self.automatic_generation_timer = settings["AutomaticGenerationTimer"] self.whisper_cooldown = settings["WhisperCooldown"] self.enable_generate_command = settings["EnableGenerateCommand"] + self.sent_separator = settings["SentenceSeparator"] def message_handler(self, m: Message): try: @@ -148,7 +148,7 @@ def message_handler(self, m: Message): if self.check_filter(m.message): sentence = "You can't make me say that, you madman!" else: - params = m.message.split(" ")[1:] + params = tokenize(m.message)[2:] # Generate an actual sentence sentence, success = self.generate(params) if success: @@ -300,7 +300,7 @@ def message_handler(self, m: Message): except Exception as e: logger.exception(e) - + def generate(self, params: List[str] = None) -> "Tuple[str, bool]": """Given an input sentence, generate the remainder of the sentence using the learned data. @@ -314,6 +314,10 @@ def generate(self, params: List[str] = None) -> "Tuple[str, bool]": if params is None: params = [] + # List of sentences that will be generated. In some cases, multiple sentences will be generated, + # e.g. when the first sentence has less words than self.min_sentence_length. + sentences = [[]] + # Check for commands or recursion, eg: !generate !generate if len(params) > 0: if self.check_if_other_command(params[0]): @@ -325,12 +329,11 @@ def generate(self, params: List[str] = None) -> "Tuple[str, bool]": if len(params) > 1: key = params[-self.key_length:] # Copy the entire params for the sentence - sentence = params.copy() + sentences[0] = params.copy() elif len(params) == 1: # First we try to find if this word was once used as the first word in a sentence: key = self.db.get_next_single_start(params[0]) - print(key) if key == None: # If this failed, we try to find the next word in the grammar as a whole key = self.db.get_next_single_initial(0, params[0]) @@ -338,49 +341,77 @@ def generate(self, params: List[str] = None) -> "Tuple[str, bool]": # Return a message that this word hasn't been learned yet return f"I haven't extracted \"{params[0]}\" from chat yet.", False # Copy this for the sentence - sentence = key.copy() + sentences[0] = key.copy() else: # if there are no params # Get starting key key = self.db.get_start() if key: # Copy this for the sentence - sentence = key.copy() + sentences[0] = key.copy() else: # If nothing's ever been said return "There is not enough learned information yet.", False - for i in range(self.max_sentence_length - self.key_length): + # Counter to prevent infinite loops (i.e. constantly generating while below the + # minimum number of words to generate) + i = 0 + while self.sentence_length(sentences) < self.max_sentence_length and i < self.max_sentence_length * 2: # Use key to get next word if i == 0: - # Prevent fetching on the first go + # Prevent fetching on the first word word = self.db.get_next_initial(i, key) else: word = self.db.get_next(i, key) + i += 1 + if word == "" or word == None: + # Break, unless we are before the min_sentence_length if i < self.min_sentence_length: key = self.db.get_start() - for entry in key: - sentence.append(entry) - word = self.db.get_next_initial(i, key) - else: - break + # Ensure that the key can be generated. Otherwise we still stop. + if key: + # Start a new sentence + sentences.append([]) + for entry in key: + sentences[-1].append(entry) + continue + break # Otherwise add the word - sentence.append(word) + sentences[-1].append(word) - # Modify the key so on the next iteration it gets the next item + # Shift the key so on the next iteration it gets the next item key.pop(0) key.append(word) # If there were params, but the sentence resulting is identical to the params # Then the params did not result in an actual sentence # If so, restart without params - if len(params) > 0 and params == sentence: - return "I haven't yet learned what to do with \"" + " ".join(params[-self.key_length:]) + "\"", False + if len(params) > 0 and params == sentences[0]: + return "I haven't yet learned what to do with \"" + detokenize(params[-self.key_length:]) + "\"", False + + return self.sent_separator.join(detokenize(sentence) for sentence in sentences), True - return " ".join(sentence), True + def sentence_length(self, sentences: List[List[str]]) -> int: + """Given a list of tokens representing a sentence, return the number of words in there. + + Args: + sentences (List[List[str]]): List of lists of tokens that make up a sentence, + where a token is a word or punctuation. For example: + [['Hello', ',', 'you', "'re", 'Tom', '!'], ['Yes', ',', 'I', 'am', '.']] + This would return 6. + + Returns: + int: The number of words in the sentence. + """ + count = 0 + for sentence in sentences: + for token in sentence: + if token not in string.punctuation and token[0] != "'": + count += 1 + return count def extract_modifiers(self, emotes: str) -> List[str]: """Extract emote modifiers from emotes, such as the the horizontal flip. @@ -468,8 +499,8 @@ def check_filter(self, message: str) -> bool: Args: message (str): The message to check. """ - for word in message.translate(self.punct_trans_table).lower().split(): - if word in self.blacklist: + for word in tokenize(message): + if word.lower() in self.blacklist: return True return False diff --git a/Settings.py b/Settings.py index a8d663c..593061d 100644 --- a/Settings.py +++ b/Settings.py @@ -23,6 +23,7 @@ class SettingsData(TypedDict): AutomaticGenerationTimer: int WhisperCooldown: bool EnableGenerateCommand: bool + SentenceSeparator: str class Settings: """ Loads data from settings.json into the bot """ @@ -44,7 +45,8 @@ class Settings: "HelpMessageTimer": 60 * 60 * 5, # 18000 seconds, 5 hours "AutomaticGenerationTimer": -1, "WhisperCooldown": True, - "EnableGenerateCommand": True + "EnableGenerateCommand": True, + "SentenceSeparator": " - ", } def __init__(self, bot) -> None: diff --git a/Tokenizer.py b/Tokenizer.py new file mode 100644 index 0000000..1334551 --- /dev/null +++ b/Tokenizer.py @@ -0,0 +1,120 @@ +import re +from typing import List +from nltk.tokenize.destructive import NLTKWordTokenizer +from nltk.tokenize.treebank import TreebankWordDetokenizer +from copy import deepcopy + +class MarkovChainTokenizer(NLTKWordTokenizer): + # Starting quotes. + STARTING_QUOTES = [ + (re.compile(u"([«“‘„]|[`]+)", re.U), r" \1 "), + # (re.compile(r"^\""), r"``"), # Custom for MarkovChain: Don't use `` as starting quotes + (re.compile(r"(``)"), r" \1 "), + (re.compile(r"([ \(\[{<])(\"|\'{2})"), r"\1 '' "), + (re.compile(r"(?i)(\')(?!re|ve|ll|m|t|s|d)(\w)\b", re.U), r"\1 \2"), + ] + + PUNCTUATION = [ + (re.compile(r"’"), r"'"), + (re.compile(r'([^\.])(\.)([\]\)}>"\'' u"»”’ " r"]*)\s*$", + re.U), r"\1 \2 \3 "), + (re.compile(r"([:,])([^\d])"), r" \1 \2"), + (re.compile(r"([:,])$"), r" \1 "), + # See https://github.com/nltk/nltk/pull/2322 + (re.compile(r"\.{2,}", re.U), r" \g<0> "), + # Custom for MarkovChain: Removed the "@" + (re.compile(r"[;#$%&]"), r" \g<0> "), + ( + re.compile(r'([^\.])(\.)([\]\)}>"\']*)\s*$'), + r"\1 \2\3 ", + ), # Handles the final period. + (re.compile(r"[?!]"), r" \g<0> "), + (re.compile(r"([^'])' "), r"\1 ' "), + # See https://github.com/nltk/nltk/pull/2322 + (re.compile(r"[*]", re.U), r" \g<0> "), + ] + + +EMOTICON_RE = re.compile(r""" +( + [<>]? + [:;=8] # eyes + [\-o\*\']? # optional nose + [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth + | + [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth + [\-o\*\']? # optional nose + [:;=8] # eyes + [<>]? + | + <3 # heart +)""", re.VERBOSE | re.I | re.UNICODE) + +_tokenize = MarkovChainTokenizer().tokenize +_detokenize = TreebankWordDetokenizer().tokenize + +def tokenize(sentence: str) -> List[str]: + """Word tokenize, separating commas, dots, apostrophes, etc. + + Uses nltk's `NLTKWordTokenizer`, but does not consider "@" to be punctuation. + Also doesn't convert "hello" to ``hello'', but to ''hello''. + + Furthermore, doesn't split emoticons, i.e. "<3" or ":)" + + Args: + sentence (str): Input sentence. + + Returns: + List[str]: Tokenized output of the sentence. + """ + + output = [] + + match = EMOTICON_RE.search(sentence) + while match: + output += _tokenize(sentence[:match.start()].strip()) + output += [match.group()] + sentence = sentence[match.end():].strip() + match = EMOTICON_RE.search(sentence) + + output += _tokenize(sentence) + + return output + +def detokenize(tokenized: List[str]) -> str: + """Detokenize a tokenized list of words and punctuation. + + Converted in a less naïve way than `" ".join(tokenized)` + + Preprocess tokenized by placing spaces before the 1st, 3rd, 5th, etc. quote, + and by placing spaces after the 2nd, 4th, 6th, etc. quote. + Then, ["He", "said", "''", "heya", "!", "''", "yesterday", "."] will be detokenized to + > He said ''heya!'' yesterday. + instead of + > He said''heya!''yesterday. + + Args: + tokenized (List[str]): Input tokens, e.g. ["Hello", ",", "I", "'m", "Tom"] + + Returns: + str: The correct string sentence, e.g. "Hello, I'm Tom" + """ + indices = [index for index, token in enumerate(tokenized) if token in ("''", "'")] + # We get the reverse of the enumerate, as we modify the list we took the indices from + enumerated = list(enumerate(indices)) + + tokenized_copy = deepcopy(tokenized) + for i, index in enumerated[::-1]: + # Opening quote + if i % 2 == 0: + # If there is another word, merge with that word and prepend a space + if len(tokenized) > index + 1: + tokenized_copy[index: index + 2] = ["".join(tokenized_copy[index: index + 2])] + + # Closing quote + else: + # If there is a previous word, merge with that word and append a space + if index > 0: + tokenized_copy[index - 1: index + 1] = ["".join(tokenized_copy[index - 1: index + 1])] + + return _detokenize(tokenized_copy).strip() \ No newline at end of file From e059a8c55b1cadf29f5cbaf22550b27c252a78ea Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Tue, 27 Jul 2021 10:35:18 +0200 Subject: [PATCH 26/30] Remove debugging --- Database.py | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/Database.py b/Database.py index 4090fec..845e2e0 100644 --- a/Database.py +++ b/Database.py @@ -337,11 +337,9 @@ def modify_start(table: str) -> None: raw_string = " ".join(tup) tokenized = tokenize(raw_string) two_gram = tokenized[:2] - # if "you're" in raw_string: - # import pdb; pdb.set_trace() - if len(two_gram) == 1: - import pdb - pdb.set_trace() + # In case there was some issue in the previous Database + if len(two_gram) < 2: + continue self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(two_gram[0][0])}_modified (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + {count} FROM MarkovStart{self.get_suffix(two_gram[0][0])}_modified WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', values=two_gram + two_gram, auto_commit=False) From 98a8e3e9e7d7b9033ecc1dfef933b480d1a9008e Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Tue, 27 Jul 2021 10:40:34 +0200 Subject: [PATCH 27/30] Modified SQL formatting --- Database.py | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/Database.py b/Database.py index 845e2e0..7b83bda 100644 --- a/Database.py +++ b/Database.py @@ -340,7 +340,17 @@ def modify_start(table: str) -> None: # In case there was some issue in the previous Database if len(two_gram) < 2: continue - self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovStart{self.get_suffix(two_gram[0][0])}_modified (word1, word2, count) VALUES (?, ?, coalesce((SELECT count + {count} FROM MarkovStart{self.get_suffix(two_gram[0][0])}_modified WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY), 1))', + self.add_execute_queue(f''' + INSERT OR REPLACE INTO MarkovStart{self.get_suffix(two_gram[0][0])}_modified (word1, word2, count) + VALUES (?, ?, coalesce ( + ( + SELECT count + {count} FROM MarkovStart{self.get_suffix(two_gram[0][0])}_modified + WHERE word1 = ? COLLATE BINARY + AND word2 = ? COLLATE BINARY + ), + 1 + ) + )''', values=two_gram + two_gram, auto_commit=False) @@ -376,7 +386,18 @@ def modify_grammar(table: str) -> None: # Filter out recursive case. if self.check_equal(ngram): continue - self.add_execute_queue(f'INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified (word1, word2, word3, count) VALUES (?, ?, ?, coalesce((SELECT count + {count} FROM MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified WHERE word1 = ? COLLATE BINARY AND word2 = ? COLLATE BINARY AND word3 = ? COLLATE BINARY), 1))', + self.add_execute_queue(f''' + INSERT OR REPLACE INTO MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified (word1, word2, word3, count) + VALUES (?, ?, ?, coalesce ( + ( + SELECT count + {count} FROM MarkovGrammar{self.get_suffix(ngram[0][0])}{self.get_suffix(ngram[1][0])}_modified + WHERE word1 = ? COLLATE BINARY + AND word2 = ? COLLATE BINARY + AND word3 = ? COLLATE BINARY + ), + 1 + ) + )''', values=ngram + ngram, auto_commit=False) From bff7cfc0bf6d248ba6110de7c1d631c7e73d6fb4 Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Tue, 27 Jul 2021 13:21:14 +0200 Subject: [PATCH 28/30] Updated Readme Settings section --- README.md | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 23ae09b..220ac22 100644 --- a/README.md +++ b/README.md @@ -237,23 +237,24 @@ This bot is controlled by a `settings.json` file, which has the following struct } ``` -| **Parameter** | **Meaning** | **Example** | -| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | -| Host | The URL that will be used. Do not change. | "irc.chat.twitch.tv" | -| Port | The Port that will be used. Do not change. | 6667 | -| Channel | The Channel that will be connected to. | "#CubieDev" | -| Nickname | The Username of the bot account. | "CubieB0T" | -| Authentication | The OAuth token for the bot account. | "oauth:pivogip8ybletucqdz4pkhag6itbax" | -| DeniedUsers | The list of bot account who's messages should not be learned from. The bot itself it automatically added to this. | ["StreamElements", "Nightbot", "Moobot", "Marbiebot"] | -| Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | -| KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | -| MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | -| MinSentenceWordAmount | The minimum number of words that can be generated. Additional sentences will begin a message is lower than this number. Prevents very small messages. -1 to disable | -1 | -| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. Defaults to once every 5 hours. | 18000 | -| AutomaticGenerationTimer | The amount of seconds between sending a generation, as if someone wrote `!g`. -1 for no automatic generations. | -1 | -| AllowedUsers | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | ["CubieDev", "Limmy"] | -| WhisperCooldown | Prevents the bot from attempting to whisper users the remaining cooldown. | true | -| EnableGenerateCommand | Globally enables/disables the generate command | true | +| **Parameter** | **Meaning** | **Example** | +| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | +| Host | The URL that will be used. Do not change. | `"irc.chat.twitch.tv"` | +| Port | The Port that will be used. Do not change. | `6667` | +| Channel | The Channel that will be connected to. | `"#CubieDev"` | +| Nickname | The Username of the bot account. | `"CubieB0T"` | +| Authentication | The OAuth token for the bot account. | `"oauth:pivogip8ybletucqdz4pkhag6itbax"` | +| DeniedUsers | The list of (bot) accounts whose messages should not be learned from. The bot itself it automatically added to this. | `["StreamElements", "Nightbot", "Moobot", "Marbiebot"]` | +| AllowedUsers | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | `["Michelle", "Cubie"]` | +| Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | +| KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | +| MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | +| MinSentenceWordAmount | The minimum number of words that can be generated. Might generate multiple sentences, separated by the value from `SentenceSeparator`. Prevents very short generations. -1 to disable. | -1 | +| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. Defaults to once every 5 hours. | 18000 | +| AutomaticGenerationTimer | The amount of seconds between automatically sending a generated message, as if someone wrote `!g`. -1 for no automatic generations. | -1 | +| WhisperCooldown | Allows the bot to whisper a user the remaining cooldown after that user has attempted to generate a message. | true | +| EnableGenerateCommand | Globally enables/disables the generate command. | true | +| SentenceSeparator | The separator between multiple sentences. Only relevant if `MinSentenceWordAmount` > 0, as only then can multiple sentences be generated. Sensible values for this might be `", "`, `". "`, `" - "` or `" "`. _Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ From c9035f6a6dea6fc0763c13a7a59929797523b0ca Mon Sep 17 00:00:00 2001 From: Tom Aarsen Date: Tue, 27 Jul 2021 13:38:33 +0200 Subject: [PATCH 29/30] Further updated Settings section in README --- README.md | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 220ac22..c5bd2eb 100644 --- a/README.md +++ b/README.md @@ -233,28 +233,29 @@ This bot is controlled by a `settings.json` file, which has the following struct "HelpMessageTimer": 18000, "AutomaticGenerationTimer": -1, "WhisperCooldown": true, - "EnableGenerateCommand": true + "EnableGenerateCommand": true, + "SentenceSeparator": " - ", } ``` -| **Parameter** | **Meaning** | **Example** | -| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | -| Host | The URL that will be used. Do not change. | `"irc.chat.twitch.tv"` | -| Port | The Port that will be used. Do not change. | `6667` | -| Channel | The Channel that will be connected to. | `"#CubieDev"` | -| Nickname | The Username of the bot account. | `"CubieB0T"` | -| Authentication | The OAuth token for the bot account. | `"oauth:pivogip8ybletucqdz4pkhag6itbax"` | -| DeniedUsers | The list of (bot) accounts whose messages should not be learned from. The bot itself it automatically added to this. | `["StreamElements", "Nightbot", "Moobot", "Marbiebot"]` | -| AllowedUsers | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | `["Michelle", "Cubie"]` | -| Cooldown | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | 20 | -| KeyLength | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | 2 | -| MaxSentenceWordAmount | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | 25 | -| MinSentenceWordAmount | The minimum number of words that can be generated. Might generate multiple sentences, separated by the value from `SentenceSeparator`. Prevents very short generations. -1 to disable. | -1 | -| HelpMessageTimer | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. Defaults to once every 5 hours. | 18000 | -| AutomaticGenerationTimer | The amount of seconds between automatically sending a generated message, as if someone wrote `!g`. -1 for no automatic generations. | -1 | -| WhisperCooldown | Allows the bot to whisper a user the remaining cooldown after that user has attempted to generate a message. | true | -| EnableGenerateCommand | Globally enables/disables the generate command. | true | -| SentenceSeparator | The separator between multiple sentences. Only relevant if `MinSentenceWordAmount` > 0, as only then can multiple sentences be generated. Sensible values for this might be `", "`, `". "`, `" - "` or `" "`. +| **Parameter** | **Meaning** | **Example** | +| -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | +| `Host` | The URL that will be used. Do not change. | `"irc.chat.twitch.tv"` | +| `Port` | The Port that will be used. Do not change. | `6667` | +| `Channel` | The Channel that will be connected to. | `"#CubieDev"` | +| `Nickname` | The Username of the bot account. | `"CubieB0T"` | +| `Authentication` | The OAuth token for the bot account. | `"oauth:pivogip8ybletucqdz4pkhag6itbax"` | +| `DeniedUsers` | The list of (bot) accounts whose messages should not be learned from. The bot itself it automatically added to this. | `["StreamElements", "Nightbot", "Moobot", "Marbiebot"]` | +| `AllowedUsers` | A list of users with heightened permissions. Gives these users the same power as the channel owner, allowing them to bypass cooldowns, set cooldowns, disable or enable the bot, etc. | `["Michelle", "Cubie"]` | +| `Cooldown` | A cooldown in seconds between successful generations. If a generation fails (eg inputs it can't work with), then the cooldown is not reset and another generation can be done immediately. | `20` | +| `KeyLength` | A technical parameter which, in my previous implementation, would affect how closely the output matches the learned inputs. In the current implementation the database structure does not allow this parameter to be changed. Do not change. | `2` | +| `MaxSentenceWordAmount` | The maximum number of words that can be generated. Prevents absurdly long and spammy generations. | `25` | +| `MinSentenceWordAmount` | The minimum number of words that can be generated. Might generate multiple sentences, separated by the value from `SentenceSeparator`. Prevents very short generations. -1 to disable. | `-1` | +| `HelpMessageTimer` | The amount of seconds between sending help messages that links to [How it works](#how-it-works). -1 for no help messages. Defaults to once every 5 hours. | `18000` | +| `AutomaticGenerationTimer` | The amount of seconds between automatically sending a generated message, as if someone wrote `!g`. -1 for no automatic generations. | `-1` | +| `WhisperCooldown` | Allows the bot to whisper a user the remaining cooldown after that user has attempted to generate a message. | `true` | +| `EnableGenerateCommand` | Globally enables/disables the generate command. | `true` | +| `SentenceSeparator` | The separator between multiple sentences. Only relevant if `MinSentenceWordAmount` > 0, as only then can multiple sentences be generated. Sensible values for this might be `", "`, `". "`, `" - "` or `" "`. | `" - "` | _Note that the example OAuth token is not an actual token, but merely a generated string to give an indication what it might look like._ From e6f2c84fdf8a51cb2f0b44133055658cfa7b408d Mon Sep 17 00:00:00 2001 From: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Date: Tue, 27 Jul 2021 14:10:07 +0200 Subject: [PATCH 30/30] Added contributor kudos --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index c5bd2eb..c95f9ed 100644 --- a/README.md +++ b/README.md @@ -282,6 +282,13 @@ This repository can be seen as an implementation using this wrapper. --- +### Contributors +My gratitude is extended to the following contributors who've decided to help out. +* [@DoctorInsano](https://github.com/DoctorInsano) - Several small fixes and improvements in [v1.0](https://github.com/tomaarsen/TwitchMarkovChain/releases/tag/v1.0). +* [@justinrusso](https://github.com/justinrusso) - Several features, refactors and fixes, that represent the core of v2.0. + +--- + ## Other Twitch Bots - [TwitchAIDungeon](https://github.com/CubieDev/TwitchAIDungeon)