Add an IBM Watson Speech to Text audio transcription profile

Add an IBM Watson Speech to Text audio transcription profile to send your media files to IBM for caption creation using their service. You must already have an IBM Cloud account to use this service with Mediasite. Once you create this profile, you can make it available to users creating content using My Mediasite.

By default, Mediasite uses IBM’s Speech to Text service to generate keywords that will appear in player, and video collections search results. The initial quality provided by this service, which is indicated by the confidence level, may not meet the needs of viewers who require captions. My Mediasite includes Caption Editor, which can be used to correct generated text. However, you may want to consider other tools also. See KBA4026 on the Customer Care Portal and My Mediasite Help for more details.

To add an IBM Speech to Text audio transcription profile:

1. Click Settings > Audio Transcription Profiles > Add New and enter a name and description for the profile that will help you and others identify it easily.

2. Select IBM Watson Speech to Text from the Template drop-down list and specify the following:

Settings	Details
Server URL	Enter the URL needed to connect to the IBM Speech to Text service. For example: https://api.us-east.speech-to-text.watson.cloud.ibm.com/instances/b55119ec-5be7-1111-9492-18abc283fa7e The Watson endpoint URL changed in May 2021. For more information, see IBM Cloud documentation for Watson services.
API Key	Enter the API key needed to connect to the service. IBM now uses an authentication process that requires an API key. If you have any issues, contact IBM.
Language	Select the language the service will use to create captions. The service cannot differentiate between multiple languages and will only create captions for the language you select here. IBM’s Speech-to-Text service supports the following languages: Arabic, Chinese, English (United Kingdom, United States), French, Japanese, Korean, Portuguese (Brazil), and Spanish.
Display results as captions	By default, this option is not selected. Select this check box if you want to include text generated using Speech to Text as captions for all presentations using this service.
Minimum confidence score	Enter a number to specify the lowest confidence score for using speech-to-text results as captions. The confidence score of a presentation is the percentage of words with high confidence. The default value is 85, which means the confidence score for results must be 85% and higher to be used as captions. When captions are ready for a presentation, a notification email is sent to the presentation owner. The email will also indicate whether the captions can be viewed in the player based on the confidence score. Users can opt of these emails in their profile settings. For more information, see Update your profile settings.
Filter %HESITATION markers from results	Select this check box to remove the “%HESITATION” marker displayed from audible hesitations such as “um” or “ah”.

Settings

Details

Server URL

Enter the URL needed to connect to the IBM Speech to Text service.

For example: https://api.us-east.speech-to-text.watson.cloud.ibm.com/instances/b55119ec-5be7-1111-9492-18abc283fa7e

The Watson endpoint URL changed in May 2021. For more information, see IBM Cloud documentation for Watson services.

API Key

Enter the API key needed to connect to the service.

IBM now uses an authentication process that requires an API key. If you have any issues, contact IBM.

Language

Select the language the service will use to create captions. The service cannot differentiate between multiple languages and will only create captions for the language you select here.

IBM’s Speech-to-Text service supports the following languages: Arabic, Chinese, English (United Kingdom, United States), French, Japanese, Korean, Portuguese (Brazil), and Spanish.

Display results as captions

By default, this option is not selected. Select this check box if you want to include text generated using Speech to Text as captions for all presentations using this service.

Minimum confidence score

Enter a number to specify the lowest confidence score for using speech-to-text results as captions. The confidence score of a presentation is the percentage of words with high confidence. The default value is 85, which means the confidence score for results must be 85% and higher to be used as captions.

When captions are ready for a presentation, a notification email is sent to the presentation owner. The email will also indicate whether the captions can be viewed in the player based on the confidence score. Users can opt of these emails in their profile settings. For more information, see Update your profile settings.

Filter %HESITATION markers from results

Select this check box to remove the “%HESITATION” marker displayed from audible hesitations such as “um” or “ah”.

3. Click Test Connection at the top of the page to verify you can connect using the credentials specified.

4. Click Add.

Add IBM Watson Speech to Text audio transcription profile