Home > Azure, C#, Universal Windows Platform > Integrating Cognitive Service Speech Recognition in UWP apps

Integrating Cognitive Service Speech Recognition in UWP apps

The Speech service, part of Cognitive Services, is powered by the same technologies used in other Microsoft products, including Cortana and Microsoft Office.

We just need to create a speech resource in Azure Portal to obtain the keys to use it in our apps. Note that, at the time of writing, the service is in Preview and is available only in East Asia, North Europe and West US.

The service is available either using the SDK or the REST API. Let’s see how to use the former in a UWP app.

First of all, we have to add the Microsoft.CognitiveServices.Speech NuGet package to the solution:

Microsoft.CognitiveServices.Speech NuGet package

Microsoft.CognitiveServices.Speech NuGet package

Then, we create a simple UI with a Button to start recognition and a TextBox to show the result:

<Grid Padding="50">
    <Grid.RowDefinitions>
        <RowDefinition Height="Auto" />
        <RowDefinition Height="*" />
    </Grid.RowDefinitions>
    <Button
        x:Name="RecognitionButton"
        Grid.Row="0"
        Margin="0,0,0,20"
        Click="RecognitionButton_Click"
        Content="Start Recognition" />
    <TextBox
        x:Name="RecognitionTextBox"
        Grid.Row="1"
        HorizontalAlignment="Stretch"
        Header="Recognized text"
        IsReadOnly="True" />
</Grid>

As the app need to use the microphone, it’s important to add the corresponding capability by double clicking the Package.appxmanifest file and go to the Capabilities tab:

Addung Micophone Capability to app

Addung Micophone Capability to app

Then, we can finally write the code to perform recognition:

private async void RecognitionButton_Click(object sender, RoutedEventArgs e)
{
    const string SpeechSubscriptionKey = "";
    const string SpeechRegion = "";
    const string Culture = "it-IT";

    var isMicAvailable = await CheckEnableMicrophoneAsync();
    if (!isMicAvailable)
    {
        return;
    }

    RecognitionButton.Content = "Recognizing...";
    RecognitionButton.IsEnabled = false;
    RecognitionTextBox.Text = string.Empty;

    var cognitiveSpeechFactory = SpeechFactory.FromSubscription
        (SpeechSubscriptionKey, SpeechRegion);

    // Starts recognition. It returns when the first utterance has been 
    // recognized.
    using (var cognitiveRecognizer = cognitiveSpeechFactory.
        CreateSpeechRecognizer(Culture, OutputFormat.Simple))
    {
        var result = await cognitiveRecognizer.RecognizeAsync();

        // Checks result.
        if (result.RecognitionStatus == RecognitionStatus.Recognized)
        {
            RecognitionTextBox.Text = result.Text;
        }
        else
        {
            await new MessageDialog(result.RecognitionFailureReason,
                result.RecognitionStatus.ToString()).ShowAsync();
        }
    }

    RecognitionButton.Content = "Start recognition";
    RecognitionButton.IsEnabled = true;
}

private async Task<bool> CheckEnableMicrophoneAsync()
{
    var isMicAvailable = false;

    try
    {
        var mediaCapture = new MediaCapture();
        var settings = new MediaCaptureInitializationSettings
        {
            StreamingCaptureMode = StreamingCaptureMode.Audio
        };

        await mediaCapture.InitializeAsync(settings);
        isMicAvailable = true;
    }
    catch
    {
        await Windows.System.Launcher.LaunchUriAsync
            (new Uri("ms-settings:privacy-microphone"));
    }

    return isMicAvailable;
}

The lines 3-5 must be completed with the Speech subscription key that we can find on Azure portal, the name of the region in which the service has been created (eastasia, northeurope or westus at this time) and the language of the speech. At the moment, the service supports Arabic, Italian, German, Japanes, English, Portoguese, Spanish, Russian, France and Chinese.

At lines 7-11 we check whether the microphone is available. The CheckEnableMicrophoneAsync methods (43-65) tries to initialize a MediaCapture object for audio: in this way, if necessary, the app will prompt the user to consent the microphone usage.

After that, we can finally start the real recognition process. We instantiate a SpeechFactory at lines 17-18 and use it for creating the Cognitive Speech recognizer at lines 22-23. The RecognizeAsync method (line 25) actually starts speech recognition, and stops after the first utterance is recognized.

If the RecognitionStatus property is equal to Recognized (line 28), it means that the recognition succeeded, so we can read the Text property to access the recognized text.

You can download the sample app using the link below:

Integrating Cognitive Service Speech in UWP apps

As said before, RecognizeAsync returns when the first utterance has been recognized, so it is suitable only for single shot recognition like command or query. For long-running recognition, we can use the StartContinuousRecognitionAsync method instead.

Advertisements
  1. No comments yet.
  1. 19/08/2018 at 01:32

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: