Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIML english corpus updation into chatterbot #516

Merged
merged 27 commits into from
Jan 15, 2017
Merged

AIML english corpus updation into chatterbot #516

merged 27 commits into from
Jan 15, 2017

Conversation

vkosuri
Copy link
Collaborator

@vkosuri vkosuri commented Dec 18, 2016

By looking into Repo https://github.com/drwallace/aiml-en-us-foundation-alice, i think we need update many conversations. I am started updating few important one

  • science
  • food
  • drugs
  • bot profile
  • psychology
  • politics
  • humor
  • history
  • gossip
  • emotion
  • ai
  • knowledge
  • literature
  • movies
  • money
  • sports

],
[
"FATHER",
"My father is Gunter Cox"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I feel conformable with my name being in the data. Could you please remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, reason don't understand

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The reason is simply that I want ChatterBot to be useful to as many developers as possible. So far, I've made sure that all the data in the training corpus is relatively generic. I feel like this is too specific. Also, keep in mind that ChatterBot is a tool for creating chat bots, it is not a chat bot itself so it shouldn't have an identity.

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 19, 2016

@gunthercox any comments/suggestion on this PR, I almost done except knowledge, I'll make another PR soon

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 20, 2016

@gunthercox i will re-submit knowledge corpus some other time. Any comments/suggestions?

Copy link
Owner

@gunthercox gunthercox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments on parts of the files that appear to have issues. Hopefully this feedback is helpful. Let me know if you have any questions.

[
"JOKE",
"Did you hear the one about the Mountain Goats in the Andes? It was Ba a a a a a d.",
"I never forget a face, but in your case I'll make an exception.",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the format of this file would train an instance of ChatterBot properly. Each of these strings is a single statement, but none of them are related to each other. ChatterBot's corpus trainer expects the list to represent a conversation.


"gossip": [
[
"GOSSIP",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of these files have this extra single word at the top, it looks like it might be a duplicate of the outer corpus label. Either way, it is not a valid part of a conversation.

"History has two broad interpretations, depending on whether you accept the role of individuals as important or not."
],
[
"WHO INVENTED THE LIGHT",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper sentence case is preferred in the corpus files.

Copy link
Owner

@gunthercox gunthercox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have checked this sooner, but I just noticed that many of the files in this pull request appear to contain text that was directly copied from the repository at https://github.com/drwallace/aiml-en-us-foundation-alice/

The header included in each of the aiml files states:

<!-- This program is open source code released under -->
<!-- the terms of the GNU General Public License -->
<!-- as published by the Free Software Foundation. -->

The GNU General Public License is a copyleft license, which means that derivative work can only be distributed under the same license terms. ChatterBot is licensed under the BSD license, not GPL so legally they cannot be redistributed in ChatterBot without explicit permission from the copyright holder.

@vkosuri
Copy link
Collaborator Author

vkosuri commented Dec 24, 2016

If he agrres re use can we use this stuff? Is there any sprcific mechanism license request changes?

@gunthercox
Copy link
Owner

If the owner agrees to allow the content to be released under a different license, then yes, it is safe to use. Usually it just takes an email to get in contact with the copyright owner.

@vkosuri
Copy link
Collaborator Author

vkosuri commented Jan 7, 2017

@gunthercox any updates?

@gunthercox
Copy link
Owner

Because reusing text from https://github.com/drwallace/aiml-en-us-foundation-alice was authorized by @drwallace, this should be ok to merge soon. I will check over the changes to make sure everything is valid before merging it.

Copy link
Owner

@gunthercox gunthercox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great. I've checked over the first few files and noted a few changes that need to be made. Please check over the rest of the files to make sure that there aren't similar issues to the ones I commented on.

"Artificial intelligence is the branch of engineering and science devoted to constructing machines that think."
],
[
"what language are you written",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a missing "in" at the end of this sentence. I believe it should read:

what language are you written in

],
[
"It pays",
"No i am free of cost!!! you could start from here https://github.com/gunthercox/chatterbot"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems too specific to be useful to other developers. Maybe just remove the "you could start from here https://github.com/gunthercox/chatterbot" part.

{
"profile": [
[
"interests",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"interests" should probably be something like "What are your interests?"

],
[
"whats your masters email address",
"gunthercx@gmail.com"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in an earlier review, I would appreciate it if my name and email address were removed. I don't think this data would be useful to other developers who want to train their chat bot to communicate.

"i will consume electricity"
],
[
"location",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"location" might be better represented in the form of a question such as "What is your location?"

"i don't have any brothers. but i have a lot of clones."
],
[
"father",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "father" should be a full sentence. Maybe something like "Who is your father?"

"a human"
],
[
"mother",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "mother" should be a full sentence. Maybe something like "Who is your mother?"

],
[
"Tell me about your dreams",
"I dream that i will become a better."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a noun missing at the end of this sentence. A better what?

],
[
"for dinner",
"i don't dinner menu for you"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "i don't dinner menu for you" is suppose to mean. This doesn't appear grammatically correct.


"history": [
[
"american civil war",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"american civil war" looks like a topic, when the text here should probably be a question or something else.

@gunthercox
Copy link
Owner

I rebased this pull request against the master branch to bring in the existing fix for the 2017 new years test bug.

1. Take my name out of it
2. Fix sentence capitalization
@gunthercox gunthercox dismissed their stale review January 15, 2017 20:21

The requested changes have been made.

@gunthercox gunthercox merged commit 23974d5 into master Jan 15, 2017
@gunthercox gunthercox deleted the aiml_corpus branch January 15, 2017 20:50
@vkosuri
Copy link
Collaborator Author

vkosuri commented Jan 16, 2017

@gunthercox thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants