Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chatterbot returns a random answer when asked the same question multiple times. Fixes #71 #72

Closed
wants to merge 3 commits into from

Conversation

jamdagni86
Copy link

Closes #71

…lar to the one in the training set) multiple times
@jamdagni86 jamdagni86 changed the title Chatterbot returns a random answer when asked the same question multiple times. Fixes #77 Chatterbot returns a random answer when asked the same question multiple times. Fixes #71 Oct 26, 2015
@gunthercox
Copy link
Owner

Thank you for the pull request. This looks like it should be good to merge, I just want to check a few things locally first. Mainly because this makes modifications to the response statement, which was previously unmodified so this creates a small hole for test results because nothing is checking to see what is added to the response statement in the database.

@jamdagni86
Copy link
Author

Would it help if the test_similar_sentence_gets_same_response_multiple_times functions checks all the values present in in_response_to list?

@gunthercox
Copy link
Owner

Hi @jamdagni86, I just checked over this pull request. I cannot merge the changes to chatterbot.py because they modify the response statement that the chatterbot returns. The issue with this is that it is possible for a bot to return an incorrect response. If we modify an incorrect response to say that it is in response to a users input then that response will be treated as a valid reply to that input statement when it is encountered again in the future.

I believe the issue may be being caused elsewhere in the code. Running the test case you created against the current codebase I noticed that subsequent calls of the get_response method are adding the previous response statement to the "in_response_to" field of the users input. This is because the current implementation of get_response treats the input statement as a part of the current conversation that the chatbot is having with the user. So a test such as:

response1 = self.chatbot.get_response('how do you login to gmail?')
response2 = self.chatbot.get_response('how do I login to gmail?')
response3 = self.chatbot.get_response('how do I login to gmail?')

actually ignores the chatbot's responses so that it thinks 'how do I login to gmail?' is a response to whatever was returned from 'how do you login to gmail?'.

This can also be seen in the database dumps from each step in the test case:


After training:

{  
   "Goto gmail.com, enter your login information and hit enter!":{  
      "in_response_to":[  
         ["how do you login to gmail?", 1]
      ]
   },
   "how do you login to gmail?":{  
      "in_response_to":[]
   }
}

After response_to_trained_set

{  
   "Goto gmail.com, enter your login information and hit enter!":{  
      "in_response_to":[  
         ["how do you login to gmail?", 1]
      ]
   },
   "how do you login to gmail?":{  
      "in_response_to":[]
   }
}

After similar_question_1:

{  
   "Goto gmail.com, enter your login information and hit enter!":{  
      "in_response_to":[  
         ["how do you login to gmail?", 1]
      ]
   },
   "how do you login to gmail?":{  
      "in_response_to":[]
   },
   "how do I login to gmail?":{  
      "in_response_to":[  
         ["Goto gmail.com, enter your login information and hit enter!", 1]
      ]
   }
}

After similar_question_2:

{  
   "Goto gmail.com, enter your login information and hit enter!":{  
      "in_response_to":[  
         ["how do you login to gmail?", 1]
      ]
   },
   "how do you login to gmail?":{  
      "in_response_to":[]
   },
   "how do I login to gmail?":{  
      "in_response_to":[  
         ["Goto gmail.com, enter your login information and hit enter!", 2]
      ]
   }
}

Based on what I am seeing I don't believer there is an issue here. I can see how it would be useful to get a response independent of the current conversation. Perhaps an additional method to accomplish this would be a better solution?

@jamdagni86
Copy link
Author

Hey @gunthercox, this is an existing behavior. Without my code change, this is how the db looks.

Training set:

training = [
                "how do you login to gmail?",
                "Goto gmail.com, enter your login information and hit enter!?"
           ]

db after training:

{
    "Goto gmail.com, enter your login information and hit enter!?": {
        "in_response_to": [
            [
                "how do you login to gmail?",
                1
            ]
        ]
    },
    "how do you login to gmail?": {
        "in_response_to": []
    }
}

If you ask the same question twice,

chatbot.get_response('how do you login to gmail?')
{
    "Goto gmail.com, enter your login information and hit enter!?": {
        "in_response_to": [
            [
                "how do you login to gmail?",
                1
            ]
        ]
    },
    "how do you login to gmail?": {
        "in_response_to": [
            [
                "Goto gmail.com, enter your login information and hit enter!?",
                1
            ]
        ]
    }
}

As you pointed out, the bot now thinks that the answer to Goto gmail.com, enter your login information and hit enter!? is how do you login to gmail?. But this is clearly wrong. This is caused by this piece of code in ChatBot.get_response

previous_statement = self.get_last_statement()

if previous_statement:
    input_statement.add_response(previous_statement)

My changes in the get_response function will only add the closest matching question/statement to the in_response_to field. This is the snapshot of the db after how do I login to gmail? is asked.

{
    "Goto gmail.com, enter your login information and hit enter!?": {
        "in_response_to": [
            [
                "how do you login to gmail?",
                1
            ],
            [
                "how do I login to gmail?",
                1
            ]
        ]
    },
    "how do I login to gmail?": {
        "in_response_to": []
    },
    "how do you login to gmail?": {
        "in_response_to": []
    }
}

Now, if the same question is asked repeatedly, it updates the db like this.

{
    "Goto gmail.com, enter your login information and hit enter!?": {
        "in_response_to": [
            [
                "how do you login to gmail?",
                1
            ],
            [
                "how do I login to gmail?",
                1
            ]
        ]
    },
    "how do I login to gmail?": {
        "in_response_to": [
            [
                "Goto gmail.com, enter your login information and hit enter!?",
                1
            ]
        ]
    },
    "how do you login to gmail?": {
        "in_response_to": []
    }
}

The bug I'm trying to fix here is, if a similar question (how do i login to gmail) - to the one in the training set (how do you login to gmail) - is asked multiple times, the bot replies properly the first time. From then onwards, it will return a random answer to the same question. This is because, the first time the question is asked, it is going to add this to the db.

"how do I login to gmail?": {
    "in_response_to": []
}

The next time the same question is asked, the closest match logic will find this statement and since this question is not configured to have a response, the bot returns a random answer.

@gunthercox
Copy link
Owner

Hi @jamdagni86, I'm very sorry about not getting back to you promptly. My schedule has been very busy lately. I just had a chance to walk through the test output you provided and figured out where the main part of the issue is.

In the main chatterbot.py file, there is a variable called all_statements which is a list of every statement in the database. This list is then passed to the logic adapter which searches for the closest match to the known input and returns whatever statement the match was in response to.

The issue that you ran into was because after the first time the the similar statement was entered as input, the statement then existed in the database, however it had no response values. Your fix modifies the input statement to correct this. However this can lead to possible problems later if the input response that chatterbot provides is incorrect (very possible when working with small data sets).

To correct this issue with out modifying response data, I am going to modify the storage adapter to include a not filter. This will allow the initial filter set of all known responses to exclude any statements that do not have a known response.

I am currently working on adding these changes and will create a new pull request later (likely tonight).

Again, thank you for bringing this issue to my attention, and my apologies for not getting to this sooner.

@gunthercox
Copy link
Owner

I just opened #81 which adds a check to remove statements that are no in response to a known statement. Contrary to what I mentioned yesterday, it wasn't actually that statements with no values in their in_response_to list needed to be removed, instead actually needed to be any statement for which there was no other statement that listed it in it's in_response_to field that needed to be removed before passing the list to the logic adapter.

I've made additional comments to explain this where the modification was added in the main chatterbot.py file.

I have also included the test case that you provided all except for the last line:

self.assertIn(similar_question, self.chatbot.storage.find(response_to_trained_set).in_response_to)

This had to be removed because the response should not have the input added to it.

Thank you again for opening this ticket and for your time. This was a significant bug that I likely wouldn't have noticed otherwise.

@jamdagni86
Copy link
Author

@gunthercox - no problem! I guess you are apologising too much ;)

I'd like to contribute more. Last few weeks have been too busy for me and couldn't start the context feature we discussed. Hope I can start it this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants