Skip to content

Transcribe Media File (AWS Transcribe) v0.4.6 Help

Converts audio or video files to text.

How can I use the Step?

Use this Step when you need a written audio or video file copy. You may use the text instantly in the Flow via the Merge field or upload the file to OneReach services.

How does the Step work?

This Step works with the provided link to a video, call recording, or audio file from the Files. With the help of the AWS Transcribe, Step identifies the used language, recognizes speakers, and converts media content to text. The resulting JSON file with the transcript is saved to the indicated folder in the Files. The transcript includes a detailed identification of each speaker's lines and alternative transcript versions for words that may sound ambiguous. You can upload the file to the Transcripts with the Process Transcript (Transcripts) Step or to other services, such as IDW or Lookup, for further interaction with data.

Prerequisites

As the Step uses only files from the Files service, be sure to upload the necessary file in AMR, FLAC, M4A, MP3, MP4, Ogg, WebM, or WAV formats. For more information, see Media formats.

File settings

In this section, you must set the following parameters:

  • Media file URL: the link to the media file for transcribing within the Files service. It must begin with http:\\ or https:\\ and end with the filename and extension. For example, https://files.staging.api.onereach.ai/public/31fb7988-b143-4d75-a16d-fb2c340ccf29/Transcribe/18.03_1.wav.
  • Folder: a location in the Files where you want to store the transcript. You can upload the file to an existing folder by clicking Select folder and choosing the folder in the popup window. Also, you can use the foldername/ format to define the folder, with / as the divider. If this path doesn't exist, the system creates it. If you want to upload the file to the root folder, enter /. The total length of the path (including the filename.extension), must not exceed 200 characters, otherwise, the Step results in an error.
  • File name prefix: a value that forms the transcript file name and defaults to transcript. Also, the file name includes the job id that is a unique identifier of the transcription job. In general, file naming follows the structure [prefix]_[job_id].json.
  • File status: the privacy settings of your file. You can change the selected parameter later in the Files service. For more information about status settings, see File privacy.

Transcription settings

In this section, you can configure the available AWS transcribe job parameters. For a detailed description of each parameter, refer to the AWS Transcribe SDK.

Note: Custom vocabularies, models, output settings, and job execution settings are not supported.

The AWS transcribe job parameters include the following:

  • LanguageCode: represents the language spoken in the input media file. If the spoken language in your media file is English, use the en-US code.
  • IdentifyLanguage: enables or disables automatic language identification. If the language is other than English or unknown, use "IdentifyLanguage": true instead of the LanguageCode parameter.
  • ShowAlternatives: enables or disables the creation of different transcript versions for the same word.
  • MaxAlternatives: required if ShowAlternatives has a true value and represents the maximum number of alternative transcripts (ranging from 2 to 10) you want the service to generate.
  • ShowSpeakerLabels: enables or disables distinguishing different speakers in the transcript output.
  • MaxSpeakerLabels: required if ShowSpeakerLabels has a true value and represents the maximum number of speakers (ranging from 2 to 10) that need to be distinguished in the media file.

Warning: If you plan to use the Process Transcript (Transcripts) Step, ensure that "ShowAlternatives": true and "ShowSpeakerLabels": true Settings parameters are set in the AWS transcribe job parameters field.

AWS transcribe job default parameters are the following:

json
{
   "LanguageCode": "en-US",
   "IdentifyLanguage": false,
   "Settings": {
      "ShowAlternatives": true,
      "MaxAlternatives": 2,
      "ShowSpeakerLabels": true,
      "MaxSpeakerLabels": 4
   } 
}
{
   "LanguageCode": "en-US",
   "IdentifyLanguage": false,
   "Settings": {
      "ShowAlternatives": true,
      "MaxAlternatives": 2,
      "ShowSpeakerLabels": true,
      "MaxSpeakerLabels": 4
   } 
}

Merge field settings

The Step returns the result as a JSON object and stores it under the Merge field name. To learn more about Merge fields and how to work with them, see our Merge fields guide.

Output example

The Step’s output includes the job id and link to the transcript in the Files. The output has the following structure:

json
{
   "jobId": "zgLIuEm7SGOJfFanyrU-wQ",
   "transcriptionUrl": "https://files.staging.api.onereach.ai/public/31fb7988-b793-4d75-a16d-fb2c340ccf29/transcripts/transcription_zgLIuEm7SGOJfFanyrU-wQ.json"
}
{
   "jobId": "zgLIuEm7SGOJfFanyrU-wQ",
   "transcriptionUrl": "https://files.staging.api.onereach.ai/public/31fb7988-b793-4d75-a16d-fb2c340ccf29/transcripts/transcription_zgLIuEm7SGOJfFanyrU-wQ.json"
}

Error handling

By default, the Step handles errors using separate exits. The Flow proceeds down the respective exit in the following cases:

  • error: if an error occurs during the Step execution, e.g., media file URL is not from the Files service.
  • timeout: if the result of the Flow execution is not received in a set time.

For more information on error handling, see Error and timeout handling.

Reporting

The Step automatically generates Reporting events during its execution, allowing for real-time tracking and analysis of its performance and user interactions. You can specify tags to organize the collected data. To learn more, see Reporting events.

Services dependencies

  • bot deployer v3.2.0
  • eks files API v1.5.0
  • library v4.0.11
  • sdk API v2.21.1
  • studio v4.6.37

Release notes

v0.4.6

  • Initial release