Based on the IBM AI platform, the speech recognition and natural language understanding services are introduced.
Artificial intelligence is a hot topic at the moment. Regardless of the system, having to do with AI is the most advanced productivity. The author mainly works in the field of collaboration, and there are many AI application scenarios in the collaboration field, such as intelligent conference systems that can automatically make meeting minutes (not just voice recognition). In this article, the author considers integrating AI into the mail system to make an intelligent voicemail query function. The main application scenario is to quickly query emails when it is inconvenient for users to enter text.
Try IBM Cloud for free
Build your next application quickly and easily with IBM Cloud Lite. Your free account never expires, and you get 256 MB of Cloud Foundry runtime memory and 2 GB of storage with a Kubernetes cluster. Learn all the details and decide how to get started. To implement voice queries, we need to do the following:
Voice recording and uploading. The author uses the voice capability of H5 for recording and voice uploading. Speech recognition. Whether it's Xiaoice, smart speakers, or IBM's debate robots, they all interact with voice. Natural language understanding. Speech recognition alone is not enough, human language is very arbitrary, and when the user says it differently, the AI should also understand what the user is saying. The difficulty of natural language understanding is actually quite large, and whether the ability of this service is strong or not directly affects the system's understanding of human thoughts. There are quite a few differences in how AI platforms handle this service and how they can do it. Email query and display. These are the more basic abilities. The author uses the overall flow chart of the mail and REST services of the Domino server
Figure 1Overall flowchart.
Domino V10 now supports nodeJS development, and the author also uses nodeJS to develop this system.
In order to facilitate the integration of this system into other apps, the author uses the multi** processing power of H5.
This seems like it should be simple, after all, too many mobile apps use voice. But when I really wanted to find a ready-made nodejs example, it really took some time. In the end, github is more powerful. Please refer to using it as a starting point, which can save a lot of time. This example supports both Windows and Linux. But this example is uploading, and we need to change the part of it to audio. For example: ControlsIt should be changed to an audio control
。And also:
Listing 1Modify the file.
var filename = generaterandomstring() '.w**';var file = new file([blob], filename, ) recording parameters are based on the requirements of the speech recognition platform, usually mono, 16000 sample rate: captureusermedia(function(stream) )
With this modified example, users can record and upload a browser to generate a w** file on the server side.
Speech recognition is relatively simple and can be used using IBM's Speech to Text speech recognition service.
First, you need to register an IBM ID to use IBM Cloud services for free on your IBM Cloud site. In IBM Cloud Services, find the Speech to Text service under the AI category, and you will soon be able to set up a speech recognition service. Figure 2Voice-to-voice services.
Watson's speech recognition service provides NodeJS API. Installing the watson-developer-cloud package allows us to use NodeJS Programming:
npm install --s**e watson-developer-cloud
For more information, please refer to the documentation, which is like this: Listing 2Invoke the Speech Translation service.
var params = ;// create the stream.var recognizestream = speechtotext.recognizeusingwebsocket(params);.
Send the recordings we upload to the Watson service to get the recognition text:
// pipe in the audio.fs.createreadstream(filename).pipe(recognizestream);
Natural language understanding is the focus of this article. IBM's famous Watson can participate in the debate competition, and his natural language understanding skills should be quite good.
Log in to the IBM Cloud console and first find Natural Language Understanding (NLU) in the AI Services category.
Figure 3Natural language understanding services.
Once you have entered the NLU service, you can create a new service. For the sake of understanding, come up with a suitable name for your service. The author's name is: natural language understanding-mailquery.
Figure 4Create a service.
In the following sections, the default is Free. Note that the last bold text says: Lightweight services will be deleted after 30 days of inactivity. In order to prevent deletion, you have to use points from time to time. Spit it out, 30 days is a little shorter.
Figure 5*Choose.
Finally, click below the Create button.
Once created, you will see not a service, but a Getting Started document.
Figure 6Getting Started Documentation.
We've all done the steps in the getting started documentation, don't be confused, just click the Manage link above Getting Started.
On the management page, you can first see the credentials and URL for accessing the service
Figure 7Credentials and URLs
The use of curl to invoke the service is also written for you, and the credentials in the service are customized according to the current service, which is commendable.
Figure 8 curl **
The author is used to using Firefox's restclient, so by the way, I will also introduce how to use restclient to call services. The invocation interface is as follows:
Figure 9 restclient
There are 2 points to note:
To add a Basic certification. The username is a fixed apikey; The password is the API key value. The HTTP header needs to be added content-type:Application JSON, and the HTTP response is the same as the demo.
Figure 10HTTP response.
Among them are sentiment analysis and keyword analysis.
And then what? In NLU, there is no then. Students who have used natural language analysis on other platforms must be confused, why is there no other customization method? IBM is IBM, and we need to build a Knowledge Studio service for this, and then deploy it to the NLU we just created.
Tap the Table of Contents above and find Knowledge Studio in the AI category.
Figure 11 knowledge studio
Go to Knowledge Studio.
Figure 12Create a Knowledge Studio service.
Click the Create button below to create a service.
Once created, click Launch the tool.
Figure 13Launch the tool.
Click Create Workspace above.
Figure 14Create a workspace
Each workspace is equivalent to a linguistic analytics environment configuration.
Figure 15workspace parameters.
Note: If the language is selected as Chinese, some functions are not available, so let's use English.
Once a workspace is created, it will automatically enter its configuration interface. Now, I finally feel a little bit of it, right?
Figure 16workspace interface.
Let's start with some concepts:
documents: the document used to train the service, which should contain many typical example sentences. entity types: entity types. For example, if our language is related to the email, then the entity category should have sender, sending time, keywords, etc. In this case, the entity type is the most important. Relation Type: the category of the relationship. Define relationships between entity categories. In this case, the relationship category doesn't really matter, so as another example, I like to play football. There is a relation type between me and kick, and between kick and football. We can name these relationships whatever we want, and after training, the service should know if there are such relationships in the user's language and identify them. For example, after training, the user says "I like to play volleyball", the service should also understand that the same rational type exists in the statement. dictionary: Dictionary. Some of the entity types in the statements we analyzed can be exhaustive, the most typical example is in the booking service, where the starting and ending points are the names of some cities. Let's start by defining the entity type.
Figure 17 entity type
Click Add Entity Type. Add a thesender entity type. Save.
Figure 18Add an entity type
Next, add 2 more:
Figure 19Add more entity types
As the name suggests, these 3 entity types represent the sender, the time of sending, and the keyword. Keywords are used for full-text search.
We know that the relation type is the relationship between the entity types, and in order to test the relation type, we define a mailkeyword entity type, which is either a mail word or a mails word. mail or mails should appear in the question.
Figure 20Define MailWord
Next, let's define the dictionary.
Figure 21Create a dictionary.
Click Create Dictionary. Take the name maildic. We want to give mailkeyword a dictionary value. So the entity type is mailkeyword.
Figure 22Add the mailkeyword dictionary value.
Click Add Entry. Write mail for surface forms, and choose noun for part of speech.
Figure 23Set the mailkeyword dictionary value to 1
Add one more value mails.
Figure 24Set the mailkeyword dictionary value to 2
When the query statement is mail or mails, the service will identify it as mailkeyword.
Next, it's time to provide a sample for analysis. You can use Excel to edit a CSV file. An example is as follows:
Figure 25A list of samples.
The first column is the sample statement number (arbitrarily numbered, not necessarily a number). The second column is a sample statement.
On the Document page, click Upload Document Sets.
Figure 26Upload the dossier.
Select the CSV file and upload it.
Figure 27Start uploading.
After uploading, we will find that there are 2 document sets, in fact, we only uploaded 1 file, and the other is all from the system. You'll notice that there's a Create Annotation Sets.
Annotation is the meaning of markup, and the machine is still an idiot, you have to tell it how to analyze the sentences in these examples, and then let it learn.
So why create annotation sets, because in actual production, there may be many examples in a file, and you can divide them among many people to complete the marking task.
Figure 28Document sets and annotation sets
Click Create Annotation Sets.
Figure 29Create annotation sets
If there is no one to help or there is very little sample, you can choose 100% of the sample as 1 set. If you need to choose an annotator, you can choose yourself. Also give this set a name, we call it mail. You'll need this name when you're training in the future.
Click the generate button. The annotation set is built.
Figure 30 annotation sets
Now that we've built the set, let's build the relation types before marking.
Go to the Relation Types page.
Figure 31Create a relation type
Click the Add Relation Type button.
Let's call the first relation type rsender. This relationship is defined as the relationship between mailkeyword and sender.
Figure 32Define the rsender relation type
Similarly, we define a total of 3 relationships:
Figure 33Define other relation types
The next step is to start tagging.
Go to the Annotation Tasks page.
Figure 34annotation task.
Click the Add Task button. Go to the following interface. The system tells you that the task is built, but the annotation set has not been put in.
Figure 35Add an annotation to the task.
Check Mail. Click the Create Task button (this button is slow to become available, you need to be slower).
Figure 36Create an annotation task.
Now, the task is set up:
Figure 37The annotation task has been added.
When I was unfamiliar with it at first, I waited for this progress for a while, but the progress was always 0%. In the end, I found that I should do my own thing. Ha ha.
Click on this task:
Figure 38Task list.
Tap the Annotation button.
Figure 39A list that needs to be tagged.
For each example statement, click the open link at the end of the statement.
Select one or a few words in the statement, and then select Entity Type on the right. For example:
Figure 40Mark entity 1
Until fully marked:
Figure 41Mark entity 2
Note: If the words in the statement are wrong, you can click on the small eyes on them to eliminate, and repeating the words cannot be eliminated, does IBM need to improve this?
Check the Relation tab.
Figure 42Mark relation 1
Select Mail (yellow) and Sender (green).
Figure 43Mark relation 2
On the right, select the rsender relationship.
Figure 44Mark relation 3
Similarly, define other relationships.
Figure 45Mark relation 4
When the entity type and relation type tags are complete, click the S**e button.
Figure 46Save your tagging work.
This button is more discreet.
After saving, click the Open Document List button to return to the sample list and continue marking until the marking is complete.
After all statements are marked, the interface is as follows:
Figure 47All marks are complete.
Click Submit All Documents to submit to Service Training.
Figure 48Commit tag document 1
Actually, it hasn't been submitted yet. You'll need to go back to the Annotation Task interface.
Figure 49Commit Tag Document 2
Select the mail you just marked and click the Accept button. I really submitted it. The finished submission screen looks like this:
Figure 50Mark Done 1
Back to the Annotation Task interface:
Figure 51Mark Done 2
It took a lot of manual work, and the machine did something.
Go to the performance page.
Figure 52Training interface.
Click the Train End Evaluate button.
Figure 53Select Training Documents.
Select Mail and click the Train button.
Figure 54Start training.
Next, make a cup of tea and rest for 20 minutes until the training session is over. It seems a little slower, but compared to babies learning to speak, it's still quite fast.
When it ends, it will be displayed:
Figure 55Training complete.
In fact, the system selects most of the data for training and a small amount of data for self-testing.
Figure 56Training results.
Because it is a sample, it is generally accurate. Not necessarily if it's actually collected production data.
Next, we go to the versions page.
Figure 57Create version 1 of the training results
Run this model can use the training results for machine recognition annotation of new samples, and then manually annotate them after the machine recognizes the annotation. Now there are no new samples, we don't click run this model.
The export current model can be exported, but it is not allowed in the free version.
Create Version is the most important thing now. I click this button. Generate a version.
Figure 58Create version 2 of the training results
The newly generated version is as follows:
Figure 59The version has been generated.
Next, knock on the blackboard. We're going to deploy this trained service to NLU.
Click the deploy button
Figure 60Deploy to NLU 1
Check Natural Language Understanding. Tap next.
Figure 61Deploy to NLU 2
Select the NLU we created and click Deploy. An interface comes out telling you that it's already being deployed. Record this Model ID. What do I need Model ID for? The reason for this is that a single NLU can use many models. When you use NLU, you need to tell NLU which model to use to analyze the language.
Figure 62Deploy to NLU 3
All right. Let's go back to NLU.
Let's change the request in the restclient to:
Listing 3Add parameters for the analytical model.
Note the model parameters. The result of NLU analysis will have our entity type:
Listing 4The results of the analysis of the entity.
, "disambiguation": ,"count": 1 },"disambiguation": ,"count": 1 },"disambiguation": ,"count": 1 }
We can replace the entities in the request with relations to get the relationship analysis of the sentence components, for example:
Listing 5Analyze the results of the relation.
Let's change the analyzed statement to I need mails from John Smith of last week about cats, the results of the analysis are also the same.
Back to nodejs programming. Watson has prepared a nodejs development package (watson-developer-cloud) for us. You can click here to view the documentation for the development kit. Depending on the needs, we focus on the Entities section of the analysis. According to the documentation, call NLU's **:
Listing 6Invoke the NLU service.
var naturallanguageunderstandin**1 = require('watson-developer-cloud/natural-language-understanding/v1.js');var natural_language_understanding = new naturallanguageunderstandin**1();var parameters = , 'keywords': }
where querystring is the statement we want to analyze; model is the analytic model we trained. As we saw in Firefox, the NLU service returns an analysis result, in JSON form. From this, we extract the key information of the query: the sender, the time of delivery, and the keyword. Note that there may be more than 1 keyword.
Originally, the author hoped to use Domino's DQL to query emails, but it would take about 2-3 months for DQL to support Chinese, so the author developed a Domino REST service to realize email query.
Domino has been supporting REST services for a long time, and they can be used by default without programming. But we want this REST service to be customized, so we can use the REST control of Xpages.
Let's create a new Xpages for queries and add a REST control on Xpages:
Figure 63REST services.
where mailq is the name of the rest service, and the nodejs program needs to call ....nsf/….xsp mailq to query the mail.
In the REST control's Service property, add a CustomRESTservice. And code in DoPost, analyze the query parameters and construct a Domino query statement to query the mail.
Figure 64REST service settings.
Finally, the query results are returned to nodejs.
Listing 7Mail search.
var dc = db.ftsearch(ftsearchstr);.while (doc != null) return list
Effect test, the author didn't spend effort on UI design, just try the function.
The author says show me mails from John Smith of Last Week to search for John Smith's emails from last week.
Figure 65Test Effect 1
Add the qualifier and it becomes show me mails from John Smith of last week about catThat's all that's left for cat.
Figure 66Test Effect 2
Regarding UI operations, this version of the way is that the user clicks on the blue button, then starts talking, and then presses the green button when he is done. It can also be made with only 1 button, press to speak, lift up to identify.
Artificial intelligence will flourish in various professional fields, and AI technology in different fields will also vary greatly. In the author's opinion, in the field of collaboration, artificial intelligence needs to understand human natural language in a humane way; On the other hand, it is necessary to integrate professional systems to provide users with business assistance. It is hoped that this article can play a role in throwing bricks and stones and stimulate readers to think more deeply.
IBM Cloud Docs, learn about the capabilities of IBM Cloud and how to get started. IBM AI Services to view a list of IBM AI services. IBM Voice Translation Services documentation. IBM Natural Language Understanding Services documentation.