Skip to content

Commit

Permalink
set your own system prompts! and update the readme with info about that
Browse files Browse the repository at this point in the history
  • Loading branch information
RUNGUSZONE committed Dec 10, 2023
1 parent ecd2698 commit 0c1b129
Show file tree
Hide file tree
Showing 5 changed files with 43 additions and 21 deletions.
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ You will need:

3. Once you have your json files, make a folder called `dirty-data` in the `preparation` folder, and put all the json files in there.

4. Run the `jsoncleaner.py` python script. This will also take a while depending on the amount of data you have.
4. Run the `jsoncleaner.py` python script.

5. When asked, enter your system prompt.

Your system prompt sets the "context" for the AI Model, as well as placing restrictions or "boundaries" on its responses. You may want to write this down for future steps, but you can also just grab it from the `output.jsonl` file.


## Step 2 - Training
Expand All @@ -38,7 +42,11 @@ Training will also take a while, especially if you've given it a lot of data. Fo

## Step 3 - Validation

1. Rename `config.sample.json` to `config.json` and enter your API Key and your Model ID into the specified fields
1. Rename `config.sample.json` to `config.json` and enter your API Key, Model ID, and system prompt into the specified fields

> **A note on system prompts**
>
> While you're in `config.json`, you need to add a system prompt. This sets the guidelines and "boundaries" that the AI *mostly* follows. You can use the same system prompt that was used in `jsoncleaner.py`, but now would be the best time to mess around and see what gives you the best results.
2. Also, specify your token amount if you want, this controls how long the messages that the bot replies with are.

Expand All @@ -48,13 +56,17 @@ Training will also take a while, especially if you've given it a lot of data. Fo

## Step 4 - Releasing it into the wild (Discord)

### Bucket will want to say slurs after a while. There's a filter in place which should block most if not all of them, but in the future we will need a better way to filter them out from OpenAI's data.
### Bucket will want to say slurs after a while. There's a filter in place which should block most if not all of them, and a well crafted system prompt will prevent them, but we will need a better solution for "ignoring" them from OpenAI's data.

1. Rename `config.sample.json` to `config.json` and enter your Discord API key, OpenAI API Key, and Fine Tuned Model ID.
1. Rename `config.sample.json` to `config.json`, enter your Discord API key, and then just copy the rest of your settings from `validation/config.json`.

2. Enter the ID of the channel you want the bot to monitor for pings and respond in into the `allowedChannelId` node.

2. Open a terminal/command prompt in the validation folder and run `node index.js`
3. Open a terminal/command prompt in the validation folder and run `node index.js`

If you're having issues with the bot, make sure the dependencies are installed by running `npm install discord.js node-fetch`

Bucket will log responses, and who triggered the bot in the `/logs/` folder.

## That's all!
If you see an issue, or want to make an improvement please feel free!
7 changes: 5 additions & 2 deletions bot/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -156,18 +156,21 @@ const processMessages = async () => {
let filteredResponse = response
.replace(/<@!\d+>/g, '') //remove ping tags
.replace(/@/g, '@\u200B') // invisible space so bot cannot ping normally
.replace(/(https?:\/\/[^\s]+)/gi, '~~link blocked~~'); // remove links
.replace(/(https?:\/\/[^\s]+)/gi, '~~link removed~~'); // remove links

// Replace blocked words based on severity category
blockedWords.forEach(word => {
const regex = new RegExp(`\\b${word.word}\\b|${word.word}(?=[\\W]|$)`, 'gi');
if (filteredResponse.match(regex)) {
blockedWordsCount++; // Increment blocked words counter for each match found
}
filteredResponse = filteredResponse.replace(regex, 'nt');
filteredResponse = filteredResponse.replace(regex, 'nt'); //temporary, seems we have something tripping up the filter, especially on words ending in "nt", like "want"
//todo: figure out why that's happening, lol.
});

logData += '\n--';
logData += `\nPre-Filter: ${response}`;
logData += '\n--';
logData += `\nFiltered: ${filteredResponse}`;
logData += '\n------------------------------------';

Expand Down
9 changes: 6 additions & 3 deletions preparation/jsoncleaner.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def load_blocked_words(csv_file_path):
blocked_words[word] = severity
return blocked_words

def create_combined_jsonl(input_folder_path, output_jsonl_path, blocked_words):
def create_combined_jsonl(input_folder_path, output_jsonl_path, blocked_words, system_prompt):
# Check if the output file exists, if not, create an empty file
if not os.path.exists(output_jsonl_path):
with open(output_jsonl_path, 'w', encoding='utf-8'):
Expand Down Expand Up @@ -51,7 +51,7 @@ def filter_word(text):
if filter_word(user_msg) and filter_word(assistant_msg):
transformed_data.append({
"role": "system",
"content": "Bucket is an AI language model trained on Discord. Bucket is not right wing, racist, sexist, homophobic, or transphobic. Bucket will refuse to say all slurs, and is generally supportive of all people. Bucket uses she/her pronouns, and her favorite color is Red. Bucket will also try to keep her responses short, and only respond as herself. Bucket's favorite user is rungus."
"content": system_prompt
})
transformed_data.append({
"role": "user",
Expand All @@ -78,6 +78,9 @@ def filter_word(text):
}
output_file.write(json.dumps(messages_set) + '\n')

# Ask user for the system prompt
systemPrompt = input("Enter the system prompt: ")

# Example usage:
blocked_words = load_blocked_words('blockedwords.csv')
create_combined_jsonl('dirty-data', 'output.jsonl', blocked_words)
create_combined_jsonl('dirty-data', 'output.jsonl', blocked_words, systemPrompt)
23 changes: 13 additions & 10 deletions validation/chatbot.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ try {
const processMessages = async () => {
try {
const config = await fs.readFile('config.json', 'utf8');
const { openai, severityThreshold, maxTokens } = JSON.parse(config);
const { openai, severityCategory, maxTokens, systemPrompt} = JSON.parse(config);
const { apiKey, modelId } = openai;

const getBlockedWords = async (severityCategory) => {
Expand All @@ -33,8 +33,6 @@ const processMessages = async () => {

const sendChatMessage = async (message) => {
try {
const systemPrompt = "Bucket is an AI language model trained on Discord. Bucket is not right wing, racist, sexist, homophobic, or transphobic. Bucket will refuse to say all slurs, and is generally supportive of all people. Bucket uses she/her pronouns, and her favorite color is Red. Bucket will also try to keep her responses short, one line, and only respond as herself. Bucket's favorite user is rungus.";

const response = await fetch(`https://api.openai.com/v1/engines/${modelId}/completions`, {
method: 'POST',
headers: {
Expand All @@ -50,7 +48,7 @@ const processMessages = async () => {
if (!response.ok) {
throw new Error(`API request failed with status ${response.status}`);
}

const data = await response.json();

if (data.choices && data.choices.length > 0 && data.choices[0].text) {
Expand All @@ -64,7 +62,7 @@ const processMessages = async () => {
}
};

const blockedWords = await getBlockedWords();
const blockedWords = await getBlockedWords(severityCategory);

console.log('Bucket AI is now active.');

Expand All @@ -80,15 +78,20 @@ const processMessages = async () => {
});

if (response) {
console.log('Raw Response: ', response, '\n');
botState = 'Processing Reply';
updateConsole();
let filteredResponse = response
.replace(/@/g, '@\u200B') // Filter pings
.replace(/(https?:\/\/[^\s]+)/gi, '[Bucket tried to send a link]'); // Filter links
.replace(/<@!\d+>/g, '') //remove ping tags
.replace(/@/g, '@\u200B') // invisible space so bot cannot ping normally
.replace(/(https?:\/\/[^\s]+)/gi, '~~link blocked~~'); // remove links

// Replace blocked words based on severity category
blockedWords.forEach(word => {
const regex = new RegExp(`\\b${word}\\b|${word}(?=[\\W]|$)`, 'gi');
filteredResponse = filteredResponse.replace(regex, '[Bucket said a blocked word]');
const regex = new RegExp(`\\b${word.word}\\b|${word.word}(?=[\\W]|$)`, 'gi');
if (filteredResponse.match(regex)) {
blockedWordsCount++; // Increment blocked words counter for each match found
}
filteredResponse = filteredResponse.replace(regex, 'nt');
});


Expand Down
3 changes: 2 additions & 1 deletion validation/config.sample.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
"modelId": "YOUR_MODEL_ID"
},
"severityCategory": 2,
"maxTokens": 15
"maxTokens": 15,
"systemPrompt": "SYSTEM-PROMPT"
}

0 comments on commit 0c1b129

Please sign in to comment.