This lab will focus on extending your simple chatbot to handle long-tail conversations, by using the Watson Discovery service.
- Successful completion of Lab 4: Understanding User Sentiment - Integrating Watson Natural Language Understanding.
- Introduction to Watson Discovery
- Setup the Discovery service
- Build a Discovery document collection
- Query the collection using Watson Discovery Query Builder
- Create a Watson Discovery IBM Cloud Function
- Integrate Watson Discovery with Watson Assistant
IBM Watson Discovery makes it possible to rapidly build cognitive applications that unlock actionable insights hidden in unstructured data — including your own proprietary data, as well as public and third-party information.
With Watson Discovery, it only takes a few steps to prepare your unstructured data, create a query that will pinpoint the information you need, and then integrate those insights into your new application or existing solution. Discovery facilitates all this by:
- automatically crawling, converting, enriching and normalising your data
- applying additional enrichments such as concepts, relations, and sentiment through Natural Language Understanding (NLU)
So far our chatbot has been trained using Watson Assistant to recognise a number of user intents, and to provide a customised response based on a developing user conversation. These are what we call short-tail responses - ones that we expect to have to deal with more often, and can therefore provide a guided experience for the user, and a custom response.
Clearly it's not possible to provide a detailed Watson Assistant dialog for every question a user might ask, especially if those questions are very specific, or not likely to be asked very often. However, Watson Discovery is great at being able to trawl larger collections of data and documents to find answers to these more specific long-tail questions, and combining Watson Assistant and Watson Discovery together makes for a powerful solution.
In this lab, we are going to use Watson Discovery to build a corpus of more detailed, specific questions a mobile phone user might ask a support centre operative. So as well as being guided to recommendations on new phones and contracts, a user will be able to ask more specific support-type questions like:
- "Do incognito tabs stay active after shutdown"
- "My Nexus 7 can't connect to my Mac"
- "How do I know my compass is pointing the right way?"
- "Can I keep reinstalling apps I've bought from the market?"
- "The GPS on my HTC Desire is not working after upgrading"
We'll integrate this capability with Watson Assistant, by passing any user query that isn't directly picked up by an intent over to Watson Discovery, which will then search the repository we've created and return the most appropriate response.
Watson Discovery runs as a service on IBM Cloud, so we first need to create the service and make a note of its security credentials for later use.
(1) Go to the IBM Cloud Catalog
, filter on AI
and select Discovery
.
Again, give the service a unique name that you will be able to recall - use the Lite
plan if you are using your personal IBM Cloud ID, or Advanced
if you are using a linked account) - and hit Create
.
(2) You may get a message saying the service may take some time to provision, and be taken to the Resource list which will indicate the status of your Discovery instance.
After a short time you should see the service as Provisioned - you may need to refresh the page to see the status change.
When the service is provisioned, select it from the list.
(3) On the Manage page, copy the API Key
and URL
as we'll need them later when we create an IBM Cloud Function call that uses the Discovery service. Once you've done that, select Launch Watson Discovery
to go to the Watson Discovery tooling application.
Now let's create a Watson Discovery document collection. A collection is a group of documents that you want to be able to search. Documents contain data of potential use to an application, e.g. question and answer pairs for use by a chatbot, FAQs, webpage documentation etc. Watson Discovery can ingest documents in PDF, Word, PowerPoint, Excel, JSON and HTML formats using a Lite plan, and additionally PNG, TIFF and JPG when using Advanced plans.
(1) After you've dismissed the welcome messages, you'll see in the tooling that there's a Watson Discovery News collection already available. Discovery News is a public data set that has been pre-enriched with cognitive insights from millions of internet news articles.
Discovery News English is automatically updated with approximately 425,000 new articles daily, and the collection can be used for many purposes, including:
- News alerting - create news alerts by taking advantage of the support for entities, keywords, categories, and sentiment analysis to watch for both news, and how it is perceived.
- Event detection - the subject/action/object semantic role extraction of Discovery checks for terms/actions such as "acquisition", "election results", or "IPO".
- Trending topics in the news - identify popular topics and monitor increases and decreases in how frequently they are mentioned.
(2) We are going to create our own collection from this dataset - it contains 10 separate Word documents, with each document containing a number of FAQ-type entries that will help our chatbot answer questions about more specific mobile phone related issues.
Download and extract the dataset now.
If you open any of the documents, you'll see they consist of a list of problems or queries and their potential resolutions. Here's an example of the format:
(3) In Watson Discovery select Upload your own data
. You may be asked a question about setting up your private data - if so, just select Set up with current plan
.
(4) Give your new collection a name (e.g. Phone-Advisor
) and click Create
.
(5) Watson Discovery can enrich (add cognitive metadata to) your ingested documents with semantic information collected by these nine Watson functions, similar to the ones you saw earlier when using Watson Natural Language Understanding:
- Entity Extraction
- Sentiment Analysis
- Category Classification
- Concept Tagging
- Keyword Extraction
- Relation Extraction
- Emotion Analysis
- Element Classification
- Semantic Role Extraction
An example of how we might use this as part of a chatbot application would be to use Sentiment Analysis
to filter out negative sounding answers before our chatbot responds to the user. Another would be to use Concept Tagging
to allow us to search a collection not just for specific words, but for related concepts. For example, if a document mentions CERN and the Higgs boson, Concepts Tagging
will identify Large Hadron Collider as a concept even if that term is not mentioned explicitly in the document, and allow us to retrieve that document if we searched for Large Hadron Collider.
You can find out more about enrichment here, and we'll enrich our ingested documents with some of the functions so that you can see how this works in practice.
(6) On the Overview screen, click select documents
, navigate to the directory holding your extracted Word documents, and open just the first document in the list.
(7) You'll see a "Processing your data" animation for a minute or so, and when the document has successfully uploaded, Discovery displays the information it has learned from ingesting the document into our collection, and by applying its default enrichments to the content.
You'll see how to do this shortly, but right now, if we ran a query against this new collection such as How do I keep the phone screen coming on when it's in my pocket?
or I want to disable screen notifications
, Discovery can use it's natural language capabilities to find that the answers to these queries are contained within the document Discovery has ingested:
This is great, but what we really want to do is get more granular. Rather than returning a whole document that the user then has to search through to find their specific answer, we want to be able to return just the relevant paragraph from the document that contains the answer to their question.
We can do this in Watson Discovery by teaching it about the format of our documents, using a cool feature called Smart Document Understanding (SDU).
(8) SDU allows you to train Watson Discovery to extract custom fields in your documents. Customising how your documents are indexed into Discovery improves the answers returned by your application.
With SDU, you annotate fields within your documents to train custom conversion models. As you annotate, Watson uses machine learning and will start predicting annotations.
From the Overview screen, select Configure data
.
You will then be presented with an editor that will allow you to visually annotate your ingested document. You can mark document titles, subtitles, headers, footers, text, and even tables and images using the editor. With upgraded versions of Discovery, you can also create your own custom fields.
In our example, we have a fairly straightforward document that consists of a document title, and several pairs of Q&As, so we'll just use the default title, question and answer fields in our annotation.
(9) We annotate in SDU by selecting the appropriate field label
, then selecting the area on the right hand page of the editor that matches the text on the left, using drag-and-drop.
Select the title
field label, then select the area on the right that matches where the title text is on the left:
(10) Now select the question
field label, then select the two areas on the right that match the headings that represent questions in our document:
Repeat this process for the text that represents the answers
to our questions.
(11) You'll then be taken to the second page of the document for annotation. As we've mentioned, Discovery uses machine learning to predict the document format as you go, and as with any ML model, the more data you provide, the more accurate the model becomes.
Depending on how quickly you annotated the first page, the second page could look like this and need full manual annotation:
Or Discovery could have attempted an annotation for you, like so:
In the first case, you'd need to manually annotate each question
and answer
field as you did on page one. For the latter, you'd need to manually correct Watson's annotations, as they are not entirely accurate yet due to limited human input!
Either way, ensure page two looks like this and then hit Submit page
:
(12) At this point - even after you've provided it with just two pages worth of annotations - Watson's predictions for the format of the rest of the document should be almost, if not 100% accurate.
For example, Watson fully and correctly annotated page three for me:
Check your own page three now, correct any errors if there are any, then hit Submit page
.
Now repeat this exercise for all of the remaining pages in the document - check Watson has annotated correctly, make any changes if needed, then select Submit page
. As you submit each page, Watson adds to its machine learning model, and you should find that you need to modify few, if any of the annotations.
Once you've submitted all fifteen pages, select Manage fields
.
(13) Here we can tell Discovery which fields to index. As we are only going to search for information in the question
and answer
fields, you can deselect everything apart from these two.
Also on this configuration screen, we can tell Discovery to spilt our document. This is helpful - and provides for better query results - as we can split a document into segments based on fields. Once split, each segment is a separate document that can be enriched, indexed, and returned as a separate query, rather than (as before) returning the whole larger document for a user to search through in order to find their specific answer.
Select question
from the dropdown menu. When we do this, Discovery will split our document into segments, with each segment containing a single question
and it's associated answer
.
Now select Enrich fields
.
(14) Using these final configuration options, we can specify which fields to apply enrichments to. We'll just apply them to our answer
field, as this is where most of our content resides.
Remove the default enrichment for the unused text
field:
Enter answer
into the Add a field to enrich
box, then select Add enrichments
:
Select the keyword, sentiment, concepts, categories, emotion and entities enrichments, then close the window.
Finally, commit all of these changes by selecting Apply changes to collection
:
(15) At this point, you will be prompted to Upload documents
to your collection - this happens every time you make changes to your Discovery configuration, as Watson has to re-ingest your documents when changes are made.
Hit select documents
, navigate to the folder where your unzipped FAQ documents reside, select them all (including the 01 document you have already uploaded), and hit Open
.
Watson now takes these ten Word documents, and using the model you built using SDU, ingests all ten, splitting them by question and answer pairs, and adding in the enrichments you specified. Clever stuff, eh?
You'll now be taken back to the Overview
screen, and if you refresh your browser, you'll start to see the number of documents ingested increase up to the final amount of around 520, which equates to the number of Q&A pairs across all of the ten Word documents.
You'll also see a summary of the enrichments that have been applied to the answer
field.
(16) One final thing whilst on this screen: select the View API Details
icon and make a note of the Environment ID and Collection ID values presented, as we'll need them for the IBM Cloud Function we'll create shortly.
(1) Now let's start to look at how we can query the collection. Click on Build your own query
:
(2) If you select the Run query
button at the bottom of the screen (having specified no query parameters), you'll see that we return the total number of documents in our collection in the matching_results
field, and below that, a selection of the actual documents themselves.
Select the brackets next to one of the question
fields to see an example of one of the "questions" Discovery has extracted from the Word documents, and do the same with the brackets next to the answer
field to show its associated "answer".
If you do the same with the enriched_answer
field, you'll see all the metadata Watson Discovery has created from the enrichments you specified, including sentiment analysis of the answer
, emotion analysis, and extracted categories and keywords.
(3) Next, we'll make use of our sentiment analysis enrichment by showing how you could, for example, filter out answers that have negative sentiment. Using Filter which documents you query, select:
- Field:
enriched_answer.sentiment.document.label
- Operator:
does not contain
- Value:
negative
At this point also set Passages to No
in More options
. Passage Retrieval lets you find pieces of information in larger documents that are ingested into Watson Discovery, finding relevant snippets from within a document based on your query. For developers, Passage Retrieval can reduce the time that it takes to hand-craft data into consumable units of information for chatbots or search and exploration interfaces. In our case, we already have created consumable units by splitting documents using Smart Document Understanding, so Passage Retrieval is not required.
If you run this query you'll now see it returns fewer results (274 in this case) as the enriched_answer
data is filtered by positive and neutral sentiment only. Scroll down to check a few.
(4) You can test a natural language query of the collection by asking a question in Search for documents.
Discovery returns documents in descending order of the score
it gives each potential match, where score
is an unbounded measure of the relevance of a particular result. A higher score indicates a greater match to the query parameters.
Try My Nexus 7 can't connect to Mac
as shown in the example here.
Note that even though our query is "My Nexus 7 can't connect to Mac", we still get a match with "Nexus 7 (2012) no longer connecting via USB Mac OS X". This is because the natural language processing capabilities of Watson Discovery are taking care of the nuances of language, so that we don't have to specify the exact words that match a document in order to return it from the collection.
You can further improve the relevancy of your results by performing relevancy training in Watson Discovery. Go here to learn more about this capability.
For more on building queries, have a look at the tutorial here.
(5) Now let's build a an IBM Cloud Function to query the collection, and then incorporate it into our Watson Assistant chatbot.
(1) Go to IBM Cloud Functions and create a new action via the sidebar menu in IBM Cloud, or directly via this link. Call it getDiscoveryTopHitXXX
, substituting XXX
for your initials once more to give the action a unique name. Use Node.js 8
for the runtime.
(2) In the code editor delete the default code and replace it with this:
/**
*
* main() will be run when you invoke this action
*
* @param Accepts a text string 'payload' used for the Discovery query
*
* @return JSON object with:
* topHitFound: true / false
* topHitQuestion: title of best matching document
* topHitAnswer: answer from best matching document
* topHitScore: Watson Discovery 'score' for query
*
* Note 1: will only return topHitFound "true" if documents matched
* Note 2: answer may be returned in multiple array entries, these are concatenated in the code
*
*/
function main({payload: payload}) {
var DiscoveryV1 = require('watson-developer-cloud/discovery/v1');
var discovery = new DiscoveryV1({
iam_apikey: '<discovery_api_key>',
url: '<discovery_url>',
version: '2017-09-01'
});
var promise = new Promise(function(resolve,reject) {
discovery.query(
{
environment_id: '<my_environment_id>',
collection_id: '<my_collection_id>',
count: 1,
query: payload
},
function(err, response) {
if (err) {
console.error(err);
reject(err);
} else {
if (response.matching_results == 0) {
resolve({topHitFound: 'false'})
} else {
var topHitQuestion = response.results[0].question[0];
var topHitAnswer = "";
for (var i = 0; i < response.results[0].answer.length; i++) {
topHitAnswer += response.results[0].answer[i];
}
var topHitScore = response.results[0].score;
resolve({topHitQuestion: topHitQuestion, topHitAnswer: topHitAnswer, topHitScore: topHitScore, topHitFound: 'true'});
}
}
}
);
});
return promise;
}
(3) You need to make four changes before saving the code:
- Change
<discovery_api_key>
to the value of theAPI Key
you saved from your Watson Discovery credentials when you created the service - Replace
<discovery_url>
with the value of the URL you saved from the same credentials - Change
<my_environment_id>
to the Environment ID value you saved earlier from within the Watson Discovery tooling - Change
<my_collection_id>
to the Collection ID value you saved earlier from within the Watson Discovery tooling
This code - which is again based on the IBM Watson documented code snippets here - accepts text as input (payload), calls your Watson Discovery service and returns the following results:
- topHitFound:
- true (document matched)
- false (no documents matched)
- topHitQuestion: title of the best matching document
- topHitAnswer: answer from the best matching document
- topHitScore: Watson Discovery confidence score for the query
(4) Test your new Discovery IBM Cloud Function by selecting Change Input
, replacing the data in the Change Action Input
window with the text below, and hitting Apply
.
{"payload": "My Nexus 7 can't connect to Mac"}
(5) Hit Invoke
, to pass this test query to the Discovery service, and view the returned results.
(6) Finally, click on Endpoints
in the sidebar, and enable this new function to be a Web Action
.
Now let's use this in our Watson Assistant dialog.
In this section we will combine the power of the Watson Assistant service with the knowledge of the Watson Discovery service by using them together.
When the user asks a question we are going to first test it against our coded intents in Watson Assistant. If the user input matches an intent, we will use the appropriate Watson Assistant response from our dialog tree.
If we don't successfully match an intent, we'll then send the user input to Watson Discovery and use the best answer from there, if one is returned from a search of our defined collection.
(1) Go to your Watson Assistant dialog. We don't need to create a new intent here as we are only going to query the Discovery document collection if we drop out of the dialog having matched no intents.
If you remember, the anything_else
special condition is triggered at the end of a dialog when the user input does not match any other dialog nodes, so we can repurpose our existing Anything else
dialog node to call Discovery instead of just responding with an "I didn't understand" message.
Select the Anything else
dialog node and rename it to Anything else: call Watson Discovery
, then delete the existing text responses.
(2) Open the JSON editor for the node, and replace the existing code with this:
{
"output": {
"generic": []
},
"actions": [
{
"name": "<my-getDiscoveryTopHit-endpoint>",
"type": "web_action",
"parameters": {
"payload": "<?input.text?>"
},
"result_variable": "$discoveryData"
}
]
}
Again, you will need to replace <my-getDiscoveryTopHit-endpoint>
with the name of your getDiscoveryTopHitXXX
endpoint, by going back to your IBM Cloud Function, clicking Endpoints
, then copying everything in the Web Action URL after .../web/.
It should look something like:
jerry.seinfeld_dev/default/getDiscoveryTopHitXXX.json
(3) Now, if we don't recognise an intent we will call our getDiscoveryTopHitXXX
IBM Cloud Function, passing the user input as our payload and using the credentials we set up in the last lab, and expect a JSON object context variable
$discoveryData
containing the values returned from the call.
(4) Next we need to build a couple of child nodes to deal with the potential responses from Watson Discovery. If we get a hit on the user's query from Discovery, we should format the relevant answer for the user. If we don't get a hit, we should send back an "I didn't understand" message.
-
Create a child node called
Document Found
. -
Remember that our
getDiscoveryTopHitXXX
function returns a value of true intopHitFound
if a match to the user query is found, so set your If assistant recognizes condition to be:
$discoveryData.topHitFound == 'true'
-
Create a
multiline
text response with the following messages:I found this question that is similar to yours: "$discoveryData.topHitQuestion"
And this is the answer: "$discoveryData.topHitAnswer"
-
Finally here, ensure this node jumps to the
Help & Reset Context
node.
(5) Create another other child node called No Document Found
which runs if $discoveryData.topHitFound == 'false'
.
Set response variations to random
, and enter some text responses similar to the ones below. We'll only get this type of "I didn't understand" message now if we draw a blank from both our Watson Assistant intents and our Watson Discovery collection.
Sorry I couldn't find anything to help. Could you perhaps try rephrasing your question?
I didn't understand your question, could you try rephrasing please?
I don't think I can help with this particular query, but please try asking me something else.
Once again, ensure this node jumps to the Help & Reset Context
node.
(6) Finally, we need to make sure our dialog flows correctly, and that we reset our context variables.
Configure the Anything else: call Watson Discovery
node so that it drops into its child nodes after the getDiscoveryTopHit
function call.
And reinitialise the $discoveryData
context variable by adding it with a null
value to the Help & Reset Context
node context editor.
(7) You should now test your chatbot with user input that follows both paths, e.g.
Is my compass is pointing the right way?
(Discovery)My Nexus 7 can't connect to Mac
(Discovery)I want a new phone
(Assistant)My S4 changed name when connected to PC
(Discovery)Blah blah
(neither path works - user should get a "I don't understand" message)
Use Try It
to work out any issues, then use one of your integrations to see it working in production!
Congratulations! You've extended your chatbot to include long-tail responses using a Watson Discovery collection. Now if your chatbot can't find a specific response to your user's question within Watson Assistant, it will use Discovery to search for answers from a larger corpus of information.
If you want to download the Watson Assistant skill we've created thus far, you can do so here. Once again, if you do import this skill, you'll have to modify:
- the
Call getSentiment function
node to refer to yourgetSentimentXXX
IBM Cloud Function API details - the
Anything else: call Watson Discovery
node to refer to yourgetDiscoveryTopHitXXX
IBM Cloud Function API details
The final part of the Watson Assistant labs will show you how you can integrate third party data into your application. Go to Lab 6: Integrating External Data using IBM Cloud Functions to complete your chatbot!