Issue
I am trying to use Deepgram streaming speech recognition for a project. I can stream the transcribed text to the console using their quickstart demo code, but the text is printed from within a callback function. I would like to get the individual chunks of transcribed text out of the callback into a single string (or an array or whatever) so I can format longer pieces of the transcription before printing it.
Seems like a similar problem as [https://stackoverflow.com/a/66279927](this question), but I think my situation needs to be treated differently due to asyncio (or something else I am not understanding)
This works, but just dumps each little piece of transcribed text to the console:
from deepgram import Deepgram
import asyncio
import aiohttp
DEEPGRAM_API_KEY = '****'
URL = 'http://stream.live.vc.bbcmedia.co.uk/bbc_world_service'
async def main():
deepgram = Deepgram(DEEPGRAM_API_KEY)
# Create a websocket connection to Deepgram
deepgramLive = await deepgram.transcription.live({ 'language': 'en-US' })
# Listen for the connection to close
deepgramLive.registerHandler(deepgramLive.event.CLOSE, lambda c: print(f'Connection closed with code {c}.'))
# Listen for any transcripts received from Deepgram and write them to the console
deepgramLive.registerHandler(deepgramLive.event.TRANSCRIPT_RECEIVED, print_transcript) # using anything more complex/persistent than print_transcript here throws 'raise AttributeError(name) from None' error
# Listen for the connection to open and send streaming audio from the URL to Deepgram
async with aiohttp.ClientSession() as session:
async with session.get(URL) as audio:
while True:
data = await audio.content.readany()
deepgramLive.send(data)
# do more with the transcribed chunks here?
if not data:
break
await deepgramLive.finish()
def print_transcript(json_data):
print(json_data['channel']['alternatives'][0]['transcript'])
asyncio.run(main())
I tried using a class with a __call__
method as in the other question and I tried messing with asyncio.Queue, but I'm missing something.
Solution
Their Python documentation is horrendous, so we have to check the source code. But it seems the LiveTranscription.register_handler
method expects the handler
argument to be of type EventHandler
as defined here. That is just a function that can be called with one argument of any type and that returns None
or an equivalent coroutine function.
This is still very badly typed because we have absolutely no idea what type of object this handler will receive in general. But judging from your code with that print_transcript
function, you seem to be expecting a dictionary (or something similar).
If you want to store those objects rather than just printing and discarding, you have many options. One would be to write a handler function that takes some sort of data structure (a list for example) as an additional argument and stores those objects in that data structure instead of printing them, then use functools.partial
in your main
function to pre-bind such a storage object to that function argument before passing the partially initialized function to register_handler
.
Something like this:
from functools import partial
from typing import Any
def store_data(data: Any, storage: list[Any]) -> None:
storage.append(data)
async def main() -> None:
...
storage = []
handler = partial(store_data, storage=storage)
deepgram_live.register_handler(deepgram_live.event.TRANSCRIPT_RECEIVED, handler)
Another almost equivalent option would be to define that handler function inside the main
function and provide it access to a storage object from within that main
function's scope:
from typing import Any
async def main() -> None:
...
storage = []
def store_data(data: Any) -> None:
storage.append(data)
deepgram_live.register_handler(deepgram_live.event.TRANSCRIPT_RECEIVED, store_data)
You could indeed use an asyncio.Queue
instead of a simple list
if you want, but the principles of how you make the handler function access that queue object are still the same.
I don't use Deepgram, so I have not tested this, but at least from what I could gather from the poor documentation, the source, and your example, I think this should work.
Answered By - Daniil Fajnberg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.