Extracting structured info from recorded conversation with Azure OpenAI Whisper and GPT-4 Models

This Python workspace demonstrates how to use the Azure OpenAI API to transcribe audio using the Whisper model and extract specific information from the transcription using the GPT-4 model.

Required Environment Variables

AZURE_OPENAI_API_KEY: Your Azure OpenAI API key.
AZURE_OPENAI_ENDPOINT: The endpoint for the Azure OpenAI API.
COSMOSDB_CONNSTRING: The CosmosDB connection string

Scenarios

Console Application

We start with the basics in a console application. This application has two approaches. The first one, showing us how to use a .wav file to read the audio from, and the second, how to start from a stream of bytes (ideal for Azure Functions or other kind of API Services)

Basic Workflow

This workflow main entry point is the main.py file.

Setup: The script first sets up the AzureOpenAI client using your API key and endpoint.
Audio Transcription: The script then transcribes an audio file (test.wav) using the Whisper model.
Information Extraction: The script uses the GPT-4 model to extract specific information from the transcription. It sends a system prompt to the model, which includes the transcription and a template for the information to be extracted.

The extracted information is then printed to the console.

Setup: The script first sets up the AzureOpenAI client using your API key and endpoint.
Audio Transcription: The script then transcribes an audio file (test.wav) using the Whisper model.
Information Extraction: The script uses the GPT-4 model to extract specific information from the transcription. It sends a system prompt to the model, which includes the transcription and a template for the information to be extracted.

The extracted information is then assembled into a TranscriptionAnalysis that will be part of an AnalysisResult: the final object that will have both the transcription analysis as well as the metadata of this record to be persisted in CosmosDB. These objects need to implement a to_dict method as the CosmosDB SDK receives dictionary objects to insert them as documents in the given collection.

Output

The script prints the extracted information in JSON format to the console.

Alternative Workflow

The primary entry point for this workflow is the alternative_main.py file, which requires the same setup as previously outlined.

This alternative scenario is designed to address situations where data is available only as a byte stream, rather than a file from physical storage. This situation is commonly encountered in Blob Triggered Azure functions or when processing data streams from APIs. These sources typically provide a byte stream without a file, lacking a name attribute. This absence can cause issues with the Whisper API, which depends on reading the file extension to verify it matches expected types.

To address this issue, I have created the NamedBytesIO class. This class allows the instantiation of an object that includes both a name attribute and a byte stream. This mimics a file stream but does not require a file from physical storage.

The insertion of this results in CosmosDB are done as in the basic scenario.

Azure Function

This Azure Blob Storage-triggered Function demonstrates how to handle a conversation recording. Upon uploading the recording to Azure Storage, an Azure Function is triggered automatically. This Function processes the recording stream through the Whisper API using the previously described NamedBytesIO adapter class to generate a transcript. This transcript is then analyzed by GPT, which produces a JSON-structured response. Additionally, this response is stored in Azure CosmosDB for subsequent processing. The starting point of the function is within the function_app.py file.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
azure-function		azure-function
common		common
console		console
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extracting structured info from recorded conversation with Azure OpenAI Whisper and GPT-4 Models

Table of Contents

Required Environment Variables

Scenarios

Console Application

Basic Workflow

Output

Alternative Workflow

Azure Function

About

Uh oh!

Releases

Packages

Uh oh!

Languages

warnov/whisper-gpt

Folders and files

Latest commit

History

Repository files navigation

Extracting structured info from recorded conversation with Azure OpenAI Whisper and GPT-4 Models

Table of Contents

Required Environment Variables

Scenarios

Console Application

Basic Workflow

Output

Alternative Workflow

Azure Function

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages