Convert Speech to Text with Python Source Code for Beginners

Convert Speech to Text with Python Source Code for Beginners

A few weeks ago, we created a Text to Speech project in Python. Today we are going to create an opposite Speech to Text script.
Transform your spoken words into written text with this powerful Speech to Text Python script! Using state-of-the-art machine learning algorithms, this script converts spoken language into written text in real-time, making it perfect for dictation, transcription, and voice-controlled applications. The script is easy to set up, customize and integrate into your projects, and it supports multiple languages. Whether you’re looking to automate speech recognition tasks or build cutting-edge voice-controlled systems, this script is the perfect solution for you.

To get started, let’s install the required module:

pip3 install SpeechRecognition pydub

Make sure you have an audio file in the current directory.

# Speech to Text Convertor with Python

import speech_recognition as sr


filename = "speech.wav"

# initialize the recognizer
r = sr.Recognizer()

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

You can use this script for small or medium size audio files, but for larger files we are going to use the following script:

# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    # split audio sound where silence is 700 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

Leave a Reply

Prev
Python Lambda Functions with Practical Examples
Python Lambda Functions with Practical Examples

Python Lambda Functions with Practical Examples

A lambda function in Python is a small anonymous function that is defined using

Next
Google Hacking (Dorking) Tutorial for Beginners
Google Hacking (Dorking) Tutorial for Beginners

Google Hacking (Dorking) Tutorial for Beginners

Google Hacking, also known as Google Dorking, is a computer hacking technique

You May Also Like